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ABSTRACT 


In  this  thesis  we  are  concerned  with  multiple-decision  problems 
involving  the  selection  of  a  variate,  or  of  a  set  of  variates,  corres¬ 
ponding  to  the  "best"  (in  a  specified  sense)  parameter  of  interest, 
in  a  multivariate  statistical  context,  in  the  presence  of  nuisance 
parameters.  Our  main  concern  is  with  the  rational  choice  of  sample 
size,  when  single-stage  procedures  are  employed;  all  problems  are 
treated  using  the  indifference- zone  and  subset  approaches.  We  require 
of  these  procedures  that  they  guarantee  a  stipulated  probability 
requirement.  In  order  to  determine  the  sample  size  necessary  to 
achieve  this  objective  using  a  single-stage  procedure,  it  is  first 
necessary  to  minimize  the  probability  of  a  correct  selection  associated 
with  the  procedure,  with  respect  to  the  parameter*'  of  interest  (in  a 
specified  region  of  the  parameter  space)  and  the  nuisance  parameters 
(for  all  possible  values  of  these  parameters). 

Our  objective  at  the  outset  of  research  in  the  present  thesis 
was  to  provide  a  solution  to  the  problem  of  selecting  the  best  subclass 
of  predictors  for  a  specified  subclass  of  variates.  (This  is  accomplished 
in  Chapter  4.)  We  soon  realized  that  this  problem  is  intimately 
connected  with  other  selection  problems  involving  covariance  matrices 


iii 


of  multivariate  normal  distributions.  Therefore,  Chapters  2,  3  and  4 
are  very  closely  related,  while  Chapter  1,  although  related  to  these 
chapters,  treats  a  different  topic. 

In  Chapter  1,  we  consider  the  problem  of  selecting  the 
variate  associated  with  the  largest  population  mean,  in  a  multivariate 
normal  population,  with  unknown  population  means,  known  (unknown) 
population  variances,  and  unknown  population  correlations. 

In  Chapter  2,  we  consider  the  problem  of  selecting  the  component 
associated  with  the  smallest  population  variance,  in  a  multivariate 
normal  population,  with  totally  unknown  parameters. 

The  results  of  Chapter  2  are  extended  in  Chapter  3  to  some 
selection  problems  concerning  generalized  variances  in  multivariate 
normal  populations.  The  results  of  this  chapter  involve  large-sample 
(asymptotic)  theory. 

Finally,  in  Chapter  4,  we  solve  (using  asymptotic  theory) 
two  problems  which  have  aroused  recent  interest  in  the  literature. 

The  first  is  that  of  selecting  the  multivariate  normal  population 
(among  independent  populations),  with  the  smallest  vector  coefficient 
of  alienation  between  two  sets  of  components.  Gupta  and  Panchapakesan 
(1969)  and  Rizvi  and  Solomon  (1973)  give  different  formulations  and 
solutions  for  this  problem. 

Secondly,  and  perhaps  more  importantly  from  the  viewpoint  of 
applications,  we  consider  the  problem  of  selecting  the  best  subclass 
of  predictors  for  a  fixed  subclass  of  variates,  each  of  the  contending 
subclasses  being  correlated  with  the  subclass  previously  specified. 

This  problem  is  treated  in  a  multivariate  normal  context,  and  a 
quite  general  asymptotic  solution  is  displayed.  The  vector  coefficient 
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of  alienation  is  used  as  a  measure  of  association.  Ramberg  (1969)  and 
Arvensen  (1971)  obtained  partial  solutions  for  related  problems.  All 
asymptotic  results  of  Chapters  2-4  are  valid  under  quite  general 
families  of  multivariate  distributions,  although,  for  simplicity,  we 
have  stated  them  under  normality  assumptions. 
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13.  ABSTRACT 


The  following  statistical  multiple-decision  problems  are  considered  for  a 
multivariate  normal  distribution  with  unknown  (or  partially  known)  covariance 
matrix,  using  the  indifference-zone  and  subset  approaches:  a)  selecting  the 
variate  with  the  largest  population  mean;  b)  selecting  the  variate  with  the 
smallest  population  variance;  c)  selecting  the  subclass  of  variates  with  the 
smallest  population  generalized  variance;  d)  selecting  the  population  with  the 
smallest  vector  coefficient  of  alienation  between  two  subclasses  of  variates; 
e)  selecting  the  best  subclass  of  predictors  for  a  specified  subclass  of  variates. 
Small-sample  theory  is  employed  in  a)  and  b),  while  large-sample  theory  is  used 
in  b) ,  c) ,  d)  and  e) . 
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HISTORICAL  REMARKS 


The  birth  and  development  of  the  idea  of  treating  certain 
statistical  problems  as  decision  problems  is  generally  credited  to 
A.  Wald.  His  work  culminated  with  the  publication  of  the  book 
Statistical  Decision  Functions  (see  Wald  (19S0) ) . 

The  first  instances  of  multiple-decision  problems,  with 
some  bearing  on  the  present  thesis,  may  be  traced  back  to  this  period. 
In  particular,  we  should  mention  the  work  of  Paulson  (1949,  1952a, 
1952b)  who  treated  classification  schemes,  comparison  with  a  control 
and  the  "slippage"  problem.  Bahadur  (1950)  and  Bahadur  and  Goodman 
(see  also  Lehmann  (1957,  1961,  1966)  and  Eaton  (1967a)),  proved 
strong  optimality  properties  for  "natural"  selection  procedures,  when 
the  experimenter  is  interested  in  selecting  the  "best"  population. 

Bachhofer  (1954)  wrote  a  pioneering  paper  in  which  he  defined 
precisely  several  possible  ranking  and  selection  goals  as  alternatives 
to  classical  tests  of  homogeneity.  In  this  paper,  the  idea  of  planning 
the  sample  size  using  an  indifference- zone  approach  with  the  purpose 
of  guaranteeing  a  specified  probability  of  a  correct  selection  or 
ranking  was  set  forth. 

Somerville  (1954)  considered  a  selection  problem,  with  explicit 
reference  to  the  use  of  the  category  selected  after  the  decision 
process.  In  planning  the  initial  experiment,  he  considered  loss  func¬ 
tions  which  "take  into  consideration  the  amount  of  use  to  be  made  of 
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the  result,  the  cost  of  making  a  wrong  decision  and  the  cost  of  sam¬ 
pling".  A  minimax  criterion  was  used. 

W.  J.  Hall  (1958,  1959)  introduced  the  notion  of  most  economical 
multiple  decision  rules  (roughly,  rules  which  require  the  smallest 
sample  sizes  to  achieve  a  certain  objective).  He  then  proved  the 
most  economical  character  of  some  of  Bechhofer's  rules. 

Dunnett  (1960)  proposed  selection  procedures  for  normal  means, 
introducing  prior  distributions  on  the  means,  and  assuming  a  known 
and  particular  covariance  matrix.  After  a  rather  complete  analysis 
without  loss  functions,  he  introduced  linear  loss  functions  and 
invoked  a  minimax  criterion,  as  in  Somerville  (1954),  and  other 
criteria,  such  as  minimizing  the  maximum  regret. 

Gupta  (1956)  introduced  the  subset  selection  approach,  in 
which  the  experimenter's  goal  is  to  select  a  subset  of  variates,  including 
the  best  one.  In  many  practical  situations,  these  may  be  regarded 
as  screening  procedures,  to  be  used  in  the  presence  of  a  large  number 
of  variates,  before  one  demands  the  selection  of  a  best  one. 

Much  of  the  literature  on  multiple-decision  (selection  and 
ranking)  procedures  since  then  has  been  concerned  with  the  indifference- 
zone  and  subset  approaches.  The  most  important  development  using 
indifference- zone  ideas  is  perhaps  the  monograph  Sequential  Identifica¬ 
tion  and  Ranking  Procedures  by  Bechhofer,  Kiefer  and  Sobel  (1968), 
in  which  sequential  procedures  for  ranking  parameters  of  Koopman- 
Darmois  populations  are  treated.  This  book  also  contains  a  rather 
complete  survey  of  the  field.  The  reader  may  consult  it  for  references 
to  practically  all  of  the  literature  up  to  1968. 


vii 


The  following  papers,  using  the  indifference- zone  approach, 
are  particulaily  relevant  to  the  present  thesis: 

Bechhofer  and  Sobel  (1954)  considered  the  problem  of  ranking 
population  variances  for  independent  normal  variates; 

Bechhofer  (1968)  studied  ranking  problems  arising  in  connec¬ 
tion  with  multiply-classified  variances  and  a  multiplicative  model 
for  these  variances; 

Paulson  (1964)  gave  a  closed  fully  sequential  procedure,  which 
eliminates  noncontending  populations,  for  the  problem  of  selecting  the 
normal  population  with  the  largest  population  mean,  when  the  common 
population  variance  is  known  or  unknown; 

Ramberg  (1969)  considered  the  problem  of  finding  a  best  set 
of  predictors  for  a  specified  variate,  in  a  multivariate  normal  context 

Ri’-'i  and  Solomon  (1973)  considered  the  problem  of  selecting 
the  population  with  the  largest  population  multiple  correlation  coef¬ 
ficient  between  a  specified  variate  and  a  set  of  variates. 

In  the  area  of  subset  selection  procedures,  the  reader  is 
referred  to  the  papers  of  Gupta  (1965)  and  Gupta  and  Panchapakesan 
(1972)  wherein  there  are  given  rather  broad  surveys  of  the  main 
results,  and  many  of  the  important  references. 

The  following  papers,  using  the  subset  approach,  are  important 
to  this  thesis: 

Gupta  and  Sobel  (1962)  considered  the  problem  of  selecting  a 
subset  of  normal  variates  containing  the  variate  with  the  smallest 
population  variance; 

Gupta  and  Panchapakensan  (1969)  considered  problems  of  selec¬ 
tion  in  terms  of  multiple  correlation  coefficients  and  conditional 
generalized  variances; 
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Arvensen  (1971)  considered  the  problem  of  selecting  a  subset 


of  subclasses  of  variates  containing  the  best  predictor  subclass, 
and  used  a  Bayesian  approach. 

Finally,  there  are  several  papers  which  employ  different 
formulations  for  selection  and  ranking  problems.  Among  these,  we 
mention  Fabian  (1962)  and  Mahamunulu  (1966,  1967),  Recently,  Gupta  and 
Santner  (1972)  proposed  a  multiple-decision  procedure  which  selects 
a  subset  of  size  not  exceeding  a  specified  upper-bound;  their  procedure 
bridges  the  indifference-zone  and  subset  approaches. 


STATEMENT  OF  PROBLEMS 


In  this  se,  ion  we  formulate  the  Droblem^  of  int  erest  to  us 
in  a  general  enough  framework  for  our  purposes.  Let 
X  =  (Xj,...,X^)  be  a  random  vector  with  distribution  function 

FX(,|0.<JO  »  where  0  =  (0^  . . .  ,0k)  and  <f>  =  (4>i , . . . ,  ,  each  0. 

and  j  being  unknown  scalars.  Our  major  interest  is  in  the  0^ 

while  the  are  regarded  as  nuisance  parameters.  Let  0^  ±  ...  £  6 


PO 


be  the  ranked  values  of  the  elements  of  the  vector  0  .  We  will  say 
that  Xj.  is  associated  with  0^  if  the  marginal  distribution  of  X^ 

depends  on  9^  and  not  on  {0^,  j  ?  i)  .  It  is  assumed  that  no 

prior  knowledge  exists  concerning  the  pairing  of  the  0^^  with  the 

Xi  (1  <_  i,  j  £  k)  . 


Indifference- zone  formulation 

Our  goal,  when  using  the  indifference-zone  approach,  will  be 
to  select  the  variate  X^  associated  with  0^^  .  For  this  goal, 

we  permit  only  k  possible  decisions,  namely  'X  (1  <  i  <  k)  is 


associated  with  0^j  ."  There  are  many  other  ranking  goals  treated 

in  the  literature,  but  we  will  consider  only  this  one  in  the  present 
thesis.  Here  correct  selection  means  selection  of  the  variate  asso¬ 
ciated  with  0,.,  (or  of  any  one  of  0,.,0,  0n1  if 

[k]  7  [q]  £q+l]  PO 
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6[q]  =  9[k]  ^ 

The  probability  requirement  associated  with  this  goal  is  not 
completely  formulated  until  a  "distance"  function  ^(9^,6^)  ,  between 

the  marginal  distributions  of  and  ,  is  adopted.  We  assume  i|> 

to  satisfy: 

^(a,b)  >_  0  for  all  pairs  (a,b)  ; 
i|»(a,b)  =  0  iff  a  =  b  ; 

<Ka,b)  =  ii»(b,a)  ; 

<Ka>b)  is  strictly  increasing  in  a  for  fixed  b  ,  and 
strictly  decreasing  in  b  for  fixed  a  , 
if  a  >_  b  . 

The  specification  of  this  distance  function  is  fundamental 
when  using  the  indifference- zone  approach.  Bechhofer,  Kiefer  and 
Sobel  (1968)  showed  that,  in  certain  problems,  the  adoption  of  a 
particular  distance  function  implies  the  nonexistence  of  a  single  or 
multi-stage  procedure  which  will  guarantee  the  probability  requirement 
(to  be  defined  shortly) . 

The  experimenter  specifies  real  constants  {6*,P*}  ,  6*  >  0  , 

i/k  <  P*  <  1  ,  prior  to  experimentation.  For  example,  if  0^  are 

location  parameters  in  the  marginal  distribution  of  X^  (1  '<_  i  ±  k)), 

we  may  take  i/»(a,b)  =  a  -  b  .  If  the  0^  are  scale  parameters,  we 

may  use  ^ (a ,b)  =  log(a/b)  . 

When  there  exists  a  decision  Rule  R  which  guarantees  the 
probability  requirement, 
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inf  PQ  .(Correct  selection  using  R)  >  P*  , 


where 


«  =  {Ce,4>)UC9[k],0[k_i])  l«*>  , 

we  say  that  R  provides  a  solution  to  the  selection  problem  relative 
to  the  distance  function  ^  .  SI  is  called  the  preference  zone,  and 
all  parameter  points  not  in  SI  are  said  to  be  in  the  indifference- 
zone.  When  the  experimenter  adopts  this  approach  he  states  in  effect 
that,  for  all  parameter  points  not  in  Q  ,  he  is  indifferent  as  to 
which  decision  is  made.  Any  point  (©,<}>)  for  which  the  infimum  is 
attained  is  called  a  least  favorable  configuration  of  the  parameters. 

Usually,  we  define  R  =  R(N)  ,  a  function  of  the  sample 
size  N  .  Then  we  determine  the  smallest  N  necessary  to  guarantee 
the  above  probability  requirement  when  R(N)  is  employed. 


Subset  formulation 


Another  possible  goal  is  to  select  a  subset  of  variates  X. 


(1  £  i  £  k  )  containing  a  variate  associated  with  0 .  There  are 

k 

2-1  possible  decisions,  namely  all  nonempty  subsets  of  (Xj,...,X^)  . 

When  using  the  so-called  subset  approach  there  is  no  need  to  consider 
distance  functions;  instead,  the  experimenter  specifies  {P*}  , 

1/k  <  P*  <  1  before  experimentation  starts.  Then,  if  correct  selection 
means  selection  of  a  subset  of  variates  containing  a  variate  associated 
with  0^  ,  Rule  R  is  said  to  provide  a  solution  to  the  selection 
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problem  if  it  guarantees  the  probability  requirement, 

inf  IV  (Correct  selection  using  Rj  >  P*  . 

0,4  U>* 

In  the  :  nit  let;’  we  lonsi.:-  r,  ft  R  ( \ »  is  a  function  of  the 

u 

sample  size  N  ,  an<l  >1  d*  ,  whuh  1  a  specified  "yardstick."  Our 
method  will  he  d.i  fix  1*  ,  and  then  find  the  smallest  N  such  that  the 
probability  requirement  i>  guaranteed ,  when  R  (N)  is  employed.  This  is 

in  contrast  wit.n  the  n-  ml  i"rim.i  it  .m  >t  such  problems  using  the  subset 
approach,  where  N  is  fixed  and  tV  is  found  to  guarantee  the  same 
probability  requirement.  It  will  be  seen  that  the  mathematical 
problems  are  equivalent,  and  our  apj roach  is  taken  just  as  a  matter 
of  convenience. 

A  few  words  about  notation.  Correct  selection  will  always 
mean  a  selection  for  which  the  goal  under  consideration  is  achieved. 

PCS  denotes  probability  of  a  correct  selection,  a.d.  stands  for 
asymptotic  distribution.  PCS  ^  denotes  PCS  ,  ard  E^  the  operator 

expectation,  when  an  a.d.  theory  is  employed. 
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CHAPTER  1 


SELECTION  OF  THE  NORMAL  VARIATE  WITH  THE  LARGEST  POPULATION 
MEAN  FROM  A  SINGLE  MULTIVARIATE  NORMAL  POPULATION 
WITH  COMMON  KNOWN  VARIANCES 


1.0.  Introduction 

In  most  of  the  present  chapter  we  consider  a  k-variate  normal 
population  and  propose  single-stage  procedures  for  selecting  the 
component  with  the  largest  population  mean.  We  assume  throughout 
that  the  population  variances  are  common  and  known. 

Section  1.1  gives  certain  preliminaries  including  a  statement 
of  an  indifference- zone  and  a  subset  formulation  of  the  problem,  which 
we  later  treat  simultaneously.  In  Section  1.2  we  consider,  for 
k  >_  3  t  the  simple  special  case  of  equal  but  unknown  population 
correlations.  The  case  k  =  2  is  treated  in  Section  1.3.  For 
k  =  3  ,  we  show  in  Section  1.4  that  the  theory  is  quite  involved,  but 
still  tractable;  exact  small-sample  results  are  obtained.  However, 
for  k  >  3  ,  only  tentative  results  are  available;  these  are  given  in 
Section  1.5.  In  Section  1.6  we  use  Bonferroni’s  inequality  to  deter¬ 
mine  j  conservative  approximation  to  the  sample  size  required  to 
guarantee  the  probability  requirement  for  the  general  k  3  case. 
Finally,  in  Section  1.7,  we  show  that  Paulson’s  (1964)  sequential 
procedure  can  be  modified  slightly  to  apply  to  the  indifference-zone 
formulation  of  the  problem  described  in  this  chapter. 

The  most  interesting  results  of  the  present  chapter,  when 
single-stage  procedures  are  used,  are  the  following:  a)  The  fact 
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that  the  least  favorable  configuration  of  the  correlation  matrix 
depends  on  the  sample  size;  b)  Using  "natural"  procedures  (i.e., 
the  same  procedures,  based  only  on  sample  means,  that  have  been  used 
for  independent  components),  the  probability  of  a  correct  selection 
can  attain  values  less  than  1/k  ,  when  the  sample  size  is  small; 
therefore,  these  "natural"  procedures  are  not  minimax  when  this 
situation  obtains. 


1.1.  Preliminari es 

Consider  a  k-variate  normal  population  X1"  =  (X.,...,X.)  with 

population  mean  vector  p*  =  (y^,...,p^)  and  population  covariance 

2  2 

matrix  a  R  .  We  assume  that  a  is  the  common  known  population 
variance,  while  R  =  (P-)  is  the  unknown  population  correlation 

matrix.  Let  p^  <_  ...  £  p  ^  be  the  ranked  values  of  the  p^  .  We 

assume  no  prior  knowledge  concerning  the  values  of  the  p^  ,  or  of  the 

pairing  of  the  p^  with  the  variates  X^  (1  <_  i,j  £  k)  . 


Indifference- zone  formulation 

The  experimenter's  goal  is  to  select  the  variate  associated 
with  p^j  .  The  experimenter  specifies  constants  {6*,P*}  , 

<5*  >  0  ,  1/k  <  P*  <  1  ,  prior  to  the  start  of  experimentation. 

Let  PCS^(p,R)  denote  the  probability  of  a  correct  selection  using 
decision  procedure  R  ,  when  p  and  R  are  the  unknown  set  of 
parameters.  We  limit  consideration  to  decision  procedures  R  which 
guarantee  the  probability  requirement: 
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inf  PCS  (y , R)  >  P* 

fl  R 


who  re 


S2  =  {(y,R)|yr  .  -  y  ,  >  {*  ,  R  a  correlation  matrix)  . 
Ik  ]  l k - 1 J 


Most  of  the  present  chapter  will  be  concerned  with  single- 
stage  procedures.  For  such  procedures,  the  experimenter  takes  a  sample 
of  N  independent  vector  observations,  X*  =  (X^, . . . , X.  ) 

(a  =  l,...,h)  .  The  following  decision  rule  has  been  proposed  for 
this  indifference- zone  formulation  of  the  problem: 

N 

Rule  B:  Let  X.=  'f  X.  /N  fl<i<k).  Then  assert  that 

-  j  ,  la  —  J  — 

J  a=l  J 

the  variate  associated  with  X^  =  maxiX^ , . . . , X^ }  has  the  largest 
population  mean. 

The  problem  is  to  determine  the  smallest  value  of  the  integer 
N  for  which  the  probability  requirement  is  guaranteed  if  Rule  B 
is  employed. 

Bechhofer  (1954)  introduced  the  indifference- zone  philosophy 
when  solving  the  above  problem  for  the  case  where  R  *  1^.  ,  i.e., 

when  all  components  of  X  are  mutually  independent.  Our  objective 
is  to  generalize  his  result  in  the  multivariate  setting. 

Subset  formulation 

If  the  experimenter's  goal  is  to  select  a  subset  of  components 
of  X  which  will  include  the  component  associate  with  y^j  ,  he 

specifies  {P*}  ,  1/k  <  P*  <  1  ,  prior  to  experimentation.  Letting 
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PCSg(M,R)  be  defined  as  above,  we  limit  consideration  to  decision 
procedures  R  which  guarantee  the  probability  requirement: 

inf  PCS,,(n,R)  >  I’*, 
n,  R  K 


The  following  decision  rule  has  been  proposed  for  this  subset 
formulation  of  the  problem: 

Rule  G:  Include  the  component  associated  with  in  the 


selected  subset  if 


X. 

1 


d*  ,  where  d*  >  0  is  specified 


in  the  units  of  the  problem. 

Our  task  is  then  to  determine  the  smallest  integer  N  for 
which  the  probability  requirement  is  guaranteed  when  Rule  G  is  used. 

This  rule  was  introduced  by  Gupta  (1956) ,  where  the  subset 
approach  was  first  proposed.  The  problem  solved  by  Gupta  (1956) 
assumed  R  =  1^  (independent  components).  Our  objective  is  to  genera 


lize  his  result  in  the  multivariate  setting. 

In  order  to  obtain  solutions  to  these  problems  we  will  first 
derive  some  preliminary  results  which  will  be  used  throughout  the 
present  chapter.  We  assume,  without  loss  of  generality,  that 
uk  1  VJ j  (j  M)  . 

Lemma  1.1.  Let 


X.  -X.-(u. 

Y1 _ i _ 3  i 

j  1/2 


n-PiP 


1/2 


(j  *  i) 


Then,  for  each  fixed  i  ,  the  (Y*  ,  j  ^  i)  have  a  standard  multi¬ 
variate  normal  distribution  with 
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1  -p . . -p .  .  ,+p  .  .  , 

corny', y',}  -  - 'J  'J  »'  ...  . 

J  J  11  2(l-p..)1/2(l-p..,)1/2 


Proof.  The  result  follows  at  once  from  the  above  definitions, 


For  simplicity  of  notation,  we  now  let 


VY;  ■  Yij  0  11  • 


Lemma  1.2.  Let  the  (Y^  ,  j  ^  k}  be  as  in  Lemma  1.1. 
(a)  If  Rule  B  is  used,  then  in  11  we  have 


(1.1)  PCS  i  P(Y.  >  -  a(N)  (1  -  P.J  '  ,  j  M) 

J  J K 


where  a(N)  =  (6*/ct) ( N/2)  '  . 

(b)  If  Rule  G  is  used,  then  we  have 


(1.2) 


PCS  1  P(Yj  >  -  a(N) ( 1  -  Pjk)  ,  j  t  k) 


where  a(N)  =  (dVo)(N/2)  . 

Proof.  We  use  Lemma  1.1  and  notice  that,  in  (a) 


PCS  =  P()C  >  X  j  t  k)  =  P(Y.  > - - yjy 


)(N/2) 


,  j  t  k) 


while  in  (b) , 


(U  -U.+d*)(N/2)' 

PCS  =  P(Xk  >  X.-d*  ,  j  /  k)  =  P(Y.  > - ^ -  l/2 

0(1'Pjk) 


,  j  /  k)  .  QED 
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Our  task  for  most  of  the  present  chapter  is  to  minimize  the 
right-hand  sides  of  (1.1)  and  (1.2)  with  respect  to  R  .  Formally, 
these  are  identical  problems,  and  thus  we  will  not  make  a  distinction, 
as  far  as  the  minimization  is  concerned,  between  the  indifference- 
zone  and  the  subset  approach.  The  expressions  (1.1)  and  (1.2) 
depend  on  S*/c  or  d*/a  ,  which  may  be  specified,  instead  of 
o  alone. 

Lemma  1.3.  Let  S  be  the  size  of  the  selected  subset  associated 
with  Rule  G.  Then 


k  (M.-y.+d*)(N/2)i/z 

(a)  E(S | y,R)  =  l  PQT  > - - —rj, -  ,  i  /  j) 

i=l  J  o(l-p..)  ^ 

ij 


where  the  (y|  ,  j  /  i)  are  as  in  Lemma  1.1. 


(b)  sup  E(s|u,R)  =  k  ,  which  occurs  when  Uj  =  ...  =  ,  and  all 

U ,  R 

elements  of  R  are  equal  to  unity. 


Proof.  This  result  is  a  consequence  of  previous  developments. 


1.2.  Case  of  equal  correlations 

When  the  off-diagonal  elements  of  R  are  known  to  be  equal 
to  a  comnon  unknown  p  (-l/(k-l)  <_  p  <_  1)  ,  the  minimization 
of  (1.1)  and  (1.2)  simplifies  considerably.  In  this  case, 

=  1/2  (i  /  j)  ,  and  the  minimum  occurs  when  p  =  -  l/(k-l)  , 

in  which  case  the  k-variate  distribution  of  X  is  degenerate,  being 
concentrated  in  a  linear  subspace  of  k  -  1  dimensions.  However, 
the  distribution  of  the  {Y^  ,  j  ^  k}  is  not  degenerate.  Therefore, 
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one  obtains  for  either  (1.1)  or  (1.2), 

(1.3)  inf  PCS  =  PJYj  >  -  a(N)((k-l)/k)1/2  ,  j  +  k) 

where  the  (Y^  ,  j  i  k)  are  as  in  Lemma  1.1,  with 
Y..  =  1/2  (i  /  j)  . 

The  inf imam  in  (1.3)  was  known  to  Milton  (1963)  and  Gupta 
(1963).  They  have  provided  tables  for  the  distribution  of  the 
(Yj  ,  j  /  k)  ,  for  several  values  of  k  .  Using  these  tables,  an 
experimenter  determines  h  =  h(k,P*)  >  0  ,  such  that 
P(Y  j  >  -  h  ,  j  /  k)  =  P*  ,  and  upon  equating 

a(N)((k-l)/k)1/2  =  h  , 

a  value  of  then  follows;  the  experimenter  employs  the  smallest 

integer  >_  . 

It  should  be  mentioned  that  Rule  B  has  many  optimum  properties 
when  the  correlations  are  equal.  For  a  large  class  of  "natural" 
loss  functions,  the  rule  has  uniformly  smallest  risk  function  among 
all  symmetrical  (invariant  under  permutation  of  components)  procedures, 
being  minimax  and  admissible  (cf.  Eaton  (1.967a),  Lehmann  (1966), 

Hall  (1959)). 

1.3.  Case  k  =  2 

Although  this  is  a  particular  case  of  the  preceding  section, 
we  state  the  result  explicitly,  so  that  it  may  be  compared  easily 
with  the  results  of  section  1.4. 
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Here,  since  k  *  2  ,  (1.1)  and  (1.2)  reduce  to  a  univariate 
normal  integral,  the  minimum  of  which  clearly  occurs  when  =  -  1 
Therefore, 

(1.4)  inf  PCS  =  P(Yj  >  -  a(N)2"1/2)  , 
where  Y^  is  a  standard  univariate  normal  variate. 


1.4.  Case  k  =  3 

Here  the  problem  is  considerably  more  complicated  than  for 
k  =  2  .  We  wish  to  minimize  the  right-hand  side  of  (1.1)  and  (1.2), 

PCS  «  POfj  >  -  a(N)(l-P13)"1/2  ,  V2  >  -  a(N)(l-P23r1/2) 

over  all  permissible  values  of  Pi2’p13,p23  *  where  the 
have  a  standard  bivariate  normal  distribution  with 


corr(Y1,Y2)  =  Yj2 


1_P13_P23+P12 


2C1-P12)1/2C1-P23)1/2  ' 


The  region  of  Euclidean  3- space  where  R  is  positive  semi- 

2 

definite  is  given  by  det  R  0  ,  <_  1  (i  f  j)  .  The  region 

det  R  ^  0  is  the  ellipsoid 


1  *  2p12p13p23~p12-p13~p23  —  0  ' 


PCS  >  0 


f°r  Pjj  t 


and  p23  t  1  . 


Lemma  1.4. 
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f£2°JL  Lc*  ^  (yi*y2)  be  the  P-d-f-  of  fYrY2}  •  According 
to  the  known  relation  (cf.,  for  example,  Placketr.  (1954)), 


fY12tyl’y2)  ■  87,8/3  fY12tyl-y2) 

2 


3p 


12 


-PCS  =  / 


-~a(N) 


/ 

-«.W 


9y 


o-Pi3)1/2  d-p23)1/2 


9yi3y2  fY12fyi,y2)dyidy2  5p12 


=  f  (■ 

Y  1 


-a(N) 


.  ^Sl77^tl/2)(l-#1,)-‘«c,.p„)-l/2  ,  o 

QE'J 


12  d-Pu)m  '  (1-P23)1/2 . 13'  '* 


Some  of  the  ideas  underlying  many  proofs  in  this  thesis, 
including  the  one  above,  derive  from  a  basic  paper  of  Slepian  (1962)  . 

It  is  easy  to  check  that  the  inf  of  PCS  does  not  occur 
when  either  pJ3  or  p23  equals  unity.  Hence,  this  case  is  excluded 

in  the  following  discussion. 

Lemma  1.2  inf  PCS  occurs  when  det  R  =  0  . 

Suppose  we  fix  p13  and  p2J  .  By  the  previous  lemma,  we 

would  set  p12  at  its  smallest  possible  value,  which  is  the  smallest 
root  of  the  quadratic  equation  det  R  =  0  ;  thus  we  obtain 


P12  =  P 1 3P 23  *  fl‘p13)1/2(1-p23)1/2  -  “  1  *  QED 

We  proceed  directly  to  the  minimization  of  PCS  .  Let  us 
define  the  following  Lagrangean  function. 


F  =  PCS  +  X  det  R  . 
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The  parameter  point  R  ,  which  leads  to  an  infinum  of  PCS  ,  subject 
to  the  restriction  det  R  =  0  ,  must  sati'.  fy  the  following  equations: 


(1.5) 


3F _  r  -a (N)  -a (>!)  -.  _ 1 _ 

"  V,1  ,,  '  1/2  ’  ,,  7l72  J  77  ,1/2,  TTfi 

12  1 12  (1-P,,)  2(1  -p ,  _)  (1-P-J 


3p 


13  rlV 

*  2*(p13p23  ‘  P12’  =  0  • 


13' 


23 


n.6j  !E_.  f  f  -  -a(N)  <  ,  -a(N)  )  12^23^ 

8p13  Y12  (1-Pjj)  /2  *  (l-PjjW2  4(1-P23)1/2(1-P13)3/2 


/  f  ( 


-a(N) 


,y9)dy. 


3/2  J  Y  1/2 

2(1-pi:)  ~a(N)  12  d-p^) 


fl‘P23) 


13 


+  2A(P23P12  ‘  P13)  =  °  * 


M  „  3F  f  f  -a(N)  -a(N)  '  P23^P12~P13~1 

3p,  "  Y-,  1  ,,  ,1/2  '  ~  '  1/2  J  ~  71717  7372 

23  '12  (1-P13)  (1-P„)  4(1-P, (1-P9,) 


23 


13' 


23 


,MT/i  I  fY 

2(l-p23)  -a(N)  Y12  1  (l-p23) 


*  2i'p12p13  ‘  023>  -  0  ' 


=  det  R  =  0  . 


By  the  symmetry  of  equations  (1. 6)  and  (1.7)  with  respect  to 
P13  and  p23  ,  one  is  led  to  study  a  solution  of  the  form 


P13  =  P23  =  T 


which  consequently  implies  by  (1.8)  and  Lenina  1.4,  that 
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P12  -  2t“  -  1  . 


Moreover,  by  substitution,  we  have  y,,  =  -  t  .  With  such  a  solution, 

C(|uations  (1.6)  and  (1.7)  become  identical,  and  in  order  to  find 
we  must  eliminate  the  Lagrange  multiplier  X  between  equations 
1.1.5)  and  (1.6).  After  simplifications,  we  arrive  at, 


(1.9) 


I  .... 


-a  (N) 
(1-T) 


172 


(1-T) 


0*3 3/2  f  r  -a(N)  -a(N) 

'T  (1-T)1/2  ’  (1-T)1/2 


a(N) 


Using  the  factorization  f(x,y)  =  f(y|x)f(x)  for  the  density 
inside  the  integral,  and  simplifying  further  still,  we  obtain. 


(1.10)  /  (2Tr)'1/2exp(-y2/2)dy  =  (2ir )"  1  /2(l/b) exp(-b2/2) 

-b 


where 


b  =  a(N)  (1  +  t)  1/2/ (1-t)  . 

Lquation  (1. 10)  has  a  unique  solution,  b  =  .5  ,  which  gives 


a(N)  =  .5(1  -  t)  ( 1  ,  t)_1/2 


and 


>  -a(N) 


PCS  =  P|Y  >  ,  i  =  1,2) 

(1-t)1/2 


5 (1-t) 

=  P(Y  >  -  V.  ,  i  =  1,2) 

(1+t)1/Z 


1? 


where  corr(Yj,Y.,)  =  -  t  , 

For  numerical  evaluations  of  PCS  it  is  convenient  to  start 
with  a  filed  value  of  t  (-1  <  t  <  1)  ,  and  then  obtain  a(N)  and 
PCS  .  Some  rough  numerical  calculations  are  given  in  Table  1.1. 

The  purpose  of  this  table  is  to  illustrate  the  variation  of  PCS 
and  a(N)  with  t  ,  rather  than  to  provide  the  reader  with  a  working 
device.  Table  1.1  was  computed  using  the  National  Bureau  of  Standards 
(1959)  tables  of  the  bivariate  normal  integral. 

One  notes  that  as  a(N)  increases  so  does  PCS  ,  as  is  to 
be  expected;  but  as  a(N)  -*■  0  ,  PCS  attains  values  less  than  1/3  . 

In  other  words,  for  small  values  of  a(N)  one  does  better  by  simply 
selecting  one  of  the  three  components  at  random  rather  than  by  using 
Rules  B  or  G-  Therefore,  for  small  values  of  a(N)  ,  these  rules  are 
not  minimax  (with  respect  to  simple  0-1  loss  functions). 

Another  curious  fact  is  that,  for  small  a(N)  ,  the  least 
favorable  configuration  of  R  is  very  close  to  a  correlation  matrix 
all  entries  of  which  are  equal  to  unity.  However,  this  is  also  the 
most  favorable  configuration  of  R  ,  since  then  PCS  =  1  .  In  other 
words,  for  a(N)  close  to  zero,  the  least  favorable  configuration  of 
R  is  "close"  to  the  most  favorable  configuration  of  R  .  One  may 
interpret  this  as  happening  when  a  is  large  compared  to  6*  or 
d*  ,  in  which  case  our  intuition  fails. 

We  have  not  been  able  to  prove  analytically  that  the  solution 
(1.10)  of  equations  (1.5),  (1.6),  (1.7)  and  (l.f)  which  we  selected 
is  indeed  the  one  which  leads  to  the  global  minimum  of  PC.  .  However, 
some  limited  numerical  results  do  indicate  that  this  is  in  fact  the 
global  minimum.  We  recommend  that  more  extensive  numerical 
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TABU:  1 .  1 

Values  of  the  Infimum  of  the  Probability  of  a  Correct 
Selection  as  a  Function  of  t  (k  =  3) 


t  a(N)  PCS 

-.9  3.00  .98 

-.7  1.55  .83 

-.5  1.06  .69 

-•2  .67  .55 

0  .50  .48 

.2  .37  .41 

.3  .31  .37 

•4  .26  .33 

.5  .21  .30 

.6  .16  .27 

.7  .12  .23 

•8  .07  .19 

.9  .04  .13 

=  .00 


99 


04 
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computations  be  carried  out  in  the  future,  and  hope  to  do  so  ourselves. 

Table  1.1  shows  that  when  a(N)  -*■  ®  ,  which  may  also  be  thought 
of  as  N  ■*  00  ,  the  least  favorable  configuration  is  near 
P13  =  p23  =  “  1  »  pi2  =  1  •  way  to  see  this  is  to  notice  that 

as  a(N)  00  ,  equations  (1.5),  (1.6),  (1.7)  and  (1.8)  become. 


2A(P13P23 


P12) 


2ifp23D12  '  P13)  =  0 
2*(p12°13  -  p23>  =  0 


det  R  =  0 


MO 


The  only  solutions  of  these  equations  are 
the  most  favorable  configuration,  and  p13 
the  least  favorable  configuration. 


» 


1.5.  Case  k  >  3 

In  this  section  our  results  are  more  tentative  than  the  results 
of  the  previous  section,  since  we  have  not  made  any  numerical  compu¬ 
tations  to  verify  that  what  we  obtain  is  indeed  a  least  favorable 
configuration.  The  present  section  could  be  written  in  parallel  with 
the  previous  one,  the  basic  ideas  being  the  same,  except  for  the  much 
more  involved  algebra.  Instead,  we  simply  give  below  the  main  results, 
without  proofs.  Let 

PCS  =  PfYj  >  -  a(N) (1  -  Pjk)"1/2  ,  j  *  k) 
where  the  (Yj  ,  j  t  k)  are  as  in  Lemma  1.1. 
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Lemma  1.6. 


3  PCS 

3pij 


>  0  for  (1  <  i  <  j  <  k  -  1)  if  p.k  t  1  (1  <  i  <  k  -  1) 


Lemma  1.7.  inf  PCS  occurs  when  det  R  =  0  . 
Consider  the  Lagrangean  function 


F  =  PCS  +  X  det  R  . 


It  can  be  shown  that  the  equations 


&7‘#  (1-i<j-k)  •  i=0’ 


admit  a  solution  of  the  form 


pik  ■  •••  =  °k-i,k  =  1  <  1  *  »  • 

P12  -  •••  ■  Pjt_2(k-1  *  {(k-l)T2  -  1 }/ (k-2)  . 


Moreover,  by  substitution. 


P  =  Y. .  =  { (k  -  3)  -  (k  -  l)t}/(2(k-2))  . 


In  the  present  context,  equation  (1.10)  is  a  particular  case  of  (1.11), 
when  k  =  3  . 


(1.11)  /.../  fp  ( z j , . . . ,  dz j . .  .dz 


k-2 


_ _  exp{  .  i  a2(N)(l-p)  , 

2(2.)1/2atN)Cl-p2)1/2  P'  2  » 

/  •  •  •  /  f p  (Wj , . . . , j)  dWp . . .  dw^_  j  , 
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where  the  limits  of  integration  in  the  left-hand  size  are  from 
1/2  -1/2  -1/2 

-  a(N)(l-p)  ( 1  - t )  (1+p)  to  “  ,  while  the  right-hand  size 

limits  of  integration  are  from 

-  a(NHl-2p)(l+p)1/2(l-T)_1/2(l-p)‘1/2(2p+l)'1/2  to  »  .  Moreover, 

f|,  (i  =  1,2)  are  the  p.d.f.'s  of  standard  multivariate  normal 
i 

distributi  is  witli  correlation  matrices  I\  ,  where  has  all  its 

off-diagonal  elements  equal  to  p/(l  +  p)  ,  while  r has  all  its 

off-diagonal  elements  equal  to  p/(2p+l)  . 

(1.11)  does  not  lend  itself  to  an  easy  solution  as  did  (1.10) 
where  we  found  b  and  consequently  computed  Table  1.1.  Although 
we  have  not  pursued  numerical  computations  for  k  >  3  ,  we  recommend 
that  (1.11)  be  used  as  follows:  for  fixed  values  of  t 
(-1  <  t  <  1)  ,  (1.11)  gives  a  unique  value  of  a(N)  ;  then,  with  r 
and  a(N)  ,  one  computes  PCS  .  As  x  varies  from  1  to  -1  , 
a(N)  ranges  from  0  to  °°  ,  and  PCS  from  0  to  1  . 

Again  for  k  >  3  ,  the  PCS  may  attain  values  less  than 
1/k  ,  if  a(N)  is  sufficiently  small.  For  example,  if  we  take 
x  =  (k  -  3)/(k  -  1)  ,  implying  p  =  0  ,  then 


PCS  =  {  /  f(z)dz} 

-»w , 


k-1 


where  f(z)  is  a  standard  univariate  normal  density.  For  a(N) 
very  small, 

PCS  =  2"^k"^ 


<  1/k  . 
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While  for  k  =  3  we  were  able  to  show  computationally,  in  a 
few  cases,  that  the  minimum  obtained  is  indeed  a  global  minimum, 
for  k  >  3  these  computational  results  arc  very  difficult  to  obtain 
because  of  the  unavailability  of  tables  of  general  multivariate  normal 
integrals  of  dimension  greater  than  2  .  It  may  be  possible  that  a 
proof  exists  for  the  uniqueness  of  the  minimum,  but  we  were  unable  to 
provide  it. 

1.6.  A  conservative  approximation  to  the  sample  size  when  k  >.  3 

While  expressions  such  as  (1.11)  seem  to  be  unmanageable,  a 
lower  bound  on  PCS  may  be  obtained  using  Bonferroni's  inequality 
as  given  in  Feller  (1968).  Indeed,  for  a  collection  of  p  events 
Ar*'-,Ap  * 

p  p  p  p 

p(  n  a.)  =  i  -  P(  u  aJ)  >  i  -  I  pcaJ)  =  l  p(a  )  -  (P  -  i)  , 

i=l  1  i=l  1  i=l  1  i=l  1 

Q 

where  A^  is  the  complement  of  A^  ,  and  Boole's  inequality  has  been 
used. 

Therefore,  since  we  know  the  minimum  when  k  =  2  ,  if  we  take 
any  k  3  , 

PCS  =  P(y.  >  -  a(N)  (1  -  Pik)‘1/2  ,  i  i  k) 

k-1  M ■? 

>  l  P(Y.  >  -  a(N)  (1  -  p  )' -  (k  -  2) 

i=l  1  1K 

>  (k  -  1) P(Yi  >  -  a(N)2'1/2)  -  'k  -  2)  . 
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Setting  the  right-hand  side  equal  to  P*  ,  one  may  easily 
solve  for  :>  using  tables  of  the  standard  univariate  normal  distri¬ 
bution. 

It  is  also  possible  to  use  the  results  we  have  for  k  =  3  , 
possibly  in  conjunction  with  results  for  k  =  2  ,  to  obtain  a  Bonfer- 
roni  approximation.  For  example,  suppose  that  k  =  5  .  Then, 

PCS  >  PiYj  >  -  a(Nl(l-p15)‘1/2  ,  Y2  >  -  a(N)(l-p25)'1/2) 

+  P(Y3  >  -  a(N) (l-p35)" 1/2  ,  y4  >  -  a(N)(l-p45)~1/2)  -  1 
>  2P(Yt  >  -  .5(1  -  t) 1/2(1  +  t)'1/2  ,  i  =  1,2)  -  1  . 

Setting  the  right-hand  side  equal  to  P*  ,  with  the  aid 
of  Table  1.1,  one  determines  N  . 

1.7.  A  sequential  procedure 

Paulson  (1964)  devised  a  sequential  procedure  for  the  problem 
of  selecting  the  normal  population  with  the  largest  population  mean, 
when  the  variances  are  known  and  equal.  This  procedure  is  fully 
sequential  and  truncated,  in  the  sense  that  populations  are  eliminated 
as  sampling  proceeds  and  there  is  a  predetermined  upper  bound  on  the 
total  number  of  stages.  In  this  section  we  show  how  Paulson's 
procedure  can  be  slightly  modified  to  handle  the  problem  of  correlated 
variates,  when  the  variances  are  known,  but  not  necessarily  equal. 
Since  the  proof  that  this  procedure  guarantees  the  PCS  over  the 
preference  region  parallels  Paulson's  proof,  we  prove  only  what  is 
strictly  necessary  and  refer  the  reader  to  Paulson's  paper  for  the 


19 


remaining  details.  In  what  follows,  we  will  use,  as  far  as  possible, 
Paulson's  notation. 

Let  (X^, . . . ,  X^s)  s  =  1,2, —  be  a  sequence  of  independent 

vectors  each  with  a  multivariate  normal  distribution  with  unknown 
population  means  (Mj,...,P^)  ,  known  population  variances 
2  2 

(o .,..., a.)  ,  and  unknown  population  correlations  p..  =  corr(X.  ,X.  )  . 
IK  i  j  1S]S 

Our  objective  is  to  select,  with  probability  at  least  P*  ,  the  com¬ 
ponent  with  the  largest  mean,  whenever  y [k]  ~  ^ [k  1 ]  —  >  0  * 

Let  0  <  X  <  6*  be  an  arbitrary  fixed  number,  and  set 

—2  2 
o  =  max  (o.  +  a.)  .  Next  define, 

i*j  1  J 


aA  =  [o2/2(6*  -  X)]  log  ((k  -  1)/(1  -  P*))  , 

and  =  the  largest  integer  less  than  a^/X  .  (Note:  Our  definition 

of  a^  is  different  from  Paulson's.)  Then  Paulson  describes  his 

Rule  P^:  "At  the  first  stage  of  the  experiment  we  take  one 

observation  from  each  variate  ,  obtaining  ...  (xu>X21’ 

Then  we  eliminate  from  further  consideration  any  variate  j  for 
which 


X^  <  max  {x11»X2i»  ••  •  -  a^  +  X  . 

If  all  but  one  variate  are  eliminated  after  the  first  stage  of  the 
experiment,  we  stop  the  experiment  and  select  the  remaining  variate 
as  the  best  one.  Otherwise  we  go  on  to  the  second  stage  of  the 
experiment  and  take  one  observation  on  each  variate  not  eliminated 
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after  the  first  stage.  Proceeding  by  induction,  at  the  rth  stage 
of  the  experiment  (r  =  2,3,...,W^)  we  take  one  observation  on 
each  variate  not  eliminated  after  the  (r  -  1)  stage,  and  then 
eliminate  any  remaining  variate  j  for  which 

r  r 

l  X.  <  max  {  7  X  }  -  a.  +  rA  , 

,  is  v  1  L.  vs  1  A 

s=l  J  s=l 

where  the  max  is  taken  over  all  variates  left  after  the  (r  -  1) 

stage.  If  only  one  variate  is  left  after  the  rth  stage,  the  experi¬ 

ment  is  terminated  and  the  remaining  variate  is  selected,  otherwise 
we  go  on  to  the  (r  +  1)  stage.  If  more  than  one  variate  remains 
after  the  stage,  the  experiment  is  terminated  at  the  (W^  +  1) 

stage  by  selecting  the  remaining  variate  for  which  the  sum  of  the 
(Wx  +  1)  observations  is  a  maximum." 

Lemma  1.8.  For  each  0  <  A  <  6*  ,  Rule  guarantees  the  probability 

requirement 

inf  PCSR  CM,  R)  1  P* 


where 


n=  <(v.R)|U[k]  -  M[k_u 


R  is  a  correlation  matrix.} 


Proof.  It  follows  from  the  lines  at  the  bottom  of  p.  176  of  Paulson's 
paper  that  in  ft  , 


k-1  n  n 

P(incorrect  selection)  <  ][  P(  ]»  X.  <  l  X  -a.+nA  for  some  n  <  ®) 

v=l  s»l  KS  S=1  vs  A 
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and, 


p(  l  (Xvs-X]<s+A)  *  ax  for  some  n  <  “  ) 
s=  1 


—  exp  “2  2 


2tW*)aX 


-2(6*-X)a 


a  +o  -2a  a  a 
v  k  v  k  ok 


-6XP  ~2  2 


a  +o  -2o  a.  p  , 
v  k  v  k  vk 


-2(6*-A)a 


±  exp 


(a  +ab) 
v  v  k 


—  ±  exp 


-2(6*-A)a> 
_ 

o 


1-P* 

1-k 


Therefore, 

P(incorrect  solution)  _<  1  -  P*  and  PCS  P*  . 

In  the  first  inequality  above  we  have  used  the  fact  that 
the  equation 


t(Xy  +  A)  2 

0  =  Ee  =  exp{t(pv-Mk+A)  +  1-  C%+^-2avokpvk)} 


has  the  unique  nonzero  root 


tn  =  -  2 (y  -p  +A)/(o2+a2-2a  a,  p  ) 
0  v  k  v  k  v  k  vk J 


QED 


*■*><& 

Mi 


CHAPTER  2 


SELECTION  OF  THE  VARIATE  WITH  THE  SMALLEST  POPULATION  VARIANCE 
FROM  A  SINGLE  MULTI VARIATE  NORMAL  POPULATION 

2.0.  Introduction 

The  problem  studied  in  the  present  chapter  was  motivated  by 
the  problem  posed  in  Section  4.3  of  Chapter  4.  The  asymptotic  solution 
provided  by  Theorem  2.3  will  be  crucial  to  the  developments  of 
Chapters  3  and  4. 

In  this  chapter  we  study  single-stage  procedures  for  selecting 
the  variate  with  the  smallest  population  variance  from  a  single 
k-variate  normal  distribution.  We  formulate  the  general  problem  in 
Section  2.1.  In  Section  2.2  we  obtain  exact  small-sample  results 
for  k  =  2  .  However,  when  k  >  2  ,  it  does  not  seem  possible  to 

extend  the  analysis  for  k  =  2  ,  as  we  point  out  in  Section  2.5.  In 

Section  2.4  we  show  how  a  conservative  approxima  ion  to  the  single- 
stage  sample  size  can  be  obtained.  In  Section  2.5  we  develop  a  large- 
sample  solution  for  the  general  case  k  ^  3  .  For  k  >_  3  ,  and 
arbitrary  correlation  matrix,  it  turns  out  (perhaps  surprisingly) 
that  the  least  favorable  configuration  of  the  correlation  matrix 
depends  on  N  ,  the  single-stage  sample  size,  in  a  very  complicated 

way.  This  is  reminiscent  of  the  results  of  Chapter  1.  The  large- 

sample  results  of  the  present  chapter  are  special  cases  of  the  results 
of  Section  3.1  of  Chapter  3.  These  large-sample  results,  although 
stated  in  a  normal  framework,  are  valid  for  large  classes  of  multi¬ 
variate  distributions,  for  which  Lemma  2.8  is  also  true. 
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2.1.  Formulation  of  the  problem 

We  consider  a  k-variate  normal  population  with  population 

2  2 

means  (p^...,^)  »  population  variances  (a^,...,o^)  and  population 

correlations  (1  <_  i  <  j  k)  .  We  denote  the  covariance  matrix 

by  E  =  {a.  . }  =  ct  R  a  ,  where  a  =  diag(a  ....  ,a,  )  and  R  =  {p. . } 
ij  l  k  lj 

2 

are  k  *  k  matrices.  Therefore,  a..  =  a.  are  the  variances.  Let 

it  l 


2  2  2 

the  ranked  values  of  the  a.  be  a  r, .  <  . . .  <  a  r.  . 

l  [1]  ~  [k] 


The  experi¬ 


menter  does  not  have  any  prior  knowledge  concerning  the  values  of  the 
parameters  of  this  multivariate  normal  population,  or  of  the  pairing 


of  the  o  with  the  variates, 

[i] 


Indifference-zone  formulation 

The  experimenter's  goal  is  to  select  the  variate  associated 


with 


the  smallest  population  variance. 


Two  constants 


{6*,P*}  ,  0*  >  1  ,  1/k  <  P*  <  1  ,  are  specified  prior  to  experimen¬ 

tation.  We  denote  the  probability  of  a  correct  selection  when  decision 
procedure  R  is  used  by  PCS^(a,R)  ,  and  restrict  consideration  to 


decision  procedures  which  guarantee  the  probability  requirement: 


(2.1) 

where 


inf  PCSR(o,R)  >_  P* 


fl  =  {(o,R) 


R  a  correlation  matrix)  . 


Bechhofer  and  Sobel  (1954)  proposed  the  following  decision 
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procedure,  when  considering  this  problem  for  the  case  R  =  1^  .  A 
sample  of  N  independent  vector  observations,  (X^, . . .  ,^a) 

(1  <  a  £N)  ,  is  taken  and  one  computes, 

N  2  n 

a..  =  I  (X.  -  X.)  ,  where  X.  =  l  X.  /N  (1  <  i  <  k)  . 

n  ,  ia  ’  l  ,  ia  v  —  —  ' 

a=l  a=l 

Rule  BS:  Assert  that  the  component  associated  with 

a^jj  —  min .  •  • , 

Our  task  is  to  determine  the  smallest  sample  size  N  necessary 

to  guarantee  the  probability  requirement  (2.1)  when  Rule  BS  is  used  and 

R  is  an  unknown  correlation  matrix. 

Subset  formulation 

In  certain  situations,  the  experimenter  may  be  interested  in 
the  selection  of  a  subset  of  variates,  which  includes  the  variate  with 
the  smallest  variance.  A  constant  { P* >  ,  1/k  <  P*  <  1  ,  is  specified 

prior  to  experimentation.  Letting  PCS^(a,R)  be  defined  as  above, 

we  restrict  consideration  to  decision  procedures  which  guarantee  the 
probability  requirement: 

(2.2)  inf  PCSr(5,R)  >  P*  . 

cf,R 

The  following  decision  procedure,  proposed  by  Gupta  and  Sobel 
(1954),  when  considering  this  problem  for  the  case  R  =  1^  ,  will 

be  used: 

Rule  GS:  Include  the  variate  associated  with  a.,  in  the  selected 
-  n 

subset  if  a^  <_  d*a^^  ,  where  d*  >  1  is  a  specified  constant. 


a^}  has  population  variance  o 


[1] 
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Our  objective  is  to  find  the  smallest  sample  size  N  which 
will  guarantee  the  probability  requirement  (2.2)  when  Rule  GS  is 
employed  and  R  is  an  unknown  correlation  matrix. 

Throughout  this  chapter,  we  assume,  without  loss  of  gene¬ 


rality,  that 


2  2 

a,  <  a . 

1  J 


(j  /  1)  .  No  consideration  will  be  given 


to  the  population  means,  since  their  configuration  is  irrelevant  for 
our  purposes. 


2.2.  Case  k  =  2 

In  this  section  we  consider  the  case  k  =  2  ,  i.e.,  the  parent 
population  is  bivariate  normal.  Writing  p^  =  P  and 

I  1  =  (a*"*)  ,  we  have 

Lemma  2.1.  The  joint  p.d.f.  of  a^  and  a^  is 


li 


(2 


.  n  .  n  . 
ii  J+7 

"  2  (°  )  yK  eXp(-a“y  /2) 

•3)  p  (y^.y,)  =  l  c.(p)  n - 

aH>a22  1  "  j-0  3  i  =  l 


nj . 

2  r(|  ♦  j) 


y.  >  0  , 


where 


2  n/2  '>1  r(7  +  " 

c  (p)  =  (1  -  p2)n/2  -1 -  >  0  ,  l  c  (p)  =  1 

J  rra  j!  j-0  3 


Proof.  Let  A  =  (a 


ii 

.  .)  ,  a. .  =  l  (X.  -  X.) (X.  -  X.) 
ij  ij  ofij  io  y 


Then  A 


has  a  Wishart  density.  Make  the  transformation  of  variables  , 
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-1/2  -1/2 

a22  =  a22  ’  rl2  =  ai2ail  a2°  ’  antl  tben  obtain  (2.3)  as  the 

marginal  p.d.f.  of  (an,a  )  .  Note  that  the  joint  p.d.f.  of 

1*11**22*  iS  a  weighted  sum  of  products  of  gamma  densities.  QED 
Lemma  2.2.  Let 


22 

a_a  a,n 


v  = 


22 


aua 


11 


22  11 
aHa22 


Then  the  p.d.f.  of  v  is 


(2-4)  p  (z)  =  l  c  (p)  F-(-2^n)  z**  \l  ♦  z)-(n+2j) 

j*0  J  (r(j+n/2) ) £' 


5  l  c  (p)f  (z)  ,  z  >  0  . 

j»0  3  J 


Proof.  In  (2.3)  make  the  transformation 


...  -  a„  ,  v  - 
11  11  ,I1°22 


then  integrate  out  y1  ,  obtaining  (2.4)  as  a  final  result.  The  p.d.f. 

of  v  is  a  weighted  sum  of  central  F  densities.  QED 

Lemma  2.3.  Define 


=  r  f. (z)dz  . 
J  1/0*  J 


Then 


bo  ibj  i»2  i  •••  • 


Proof.  For  j  >  1  , 


(2.5)  b.  -  b.  , 
J  J-l 


p  .  , 

r(li+2j)  2-1'1 

- J  Z 

(p(j  +j))  1/0* 


fl  * 


(r(^j-i)r  l/o* 


It  is  easy  to  show  that,  ^f  we  integrate  by  parts  the  first 
integral  in  (2.5),  and  then  twice  integrate  oy  parts  the  second 
integral  in  (2.5),  we  obtain, 


b  .  -  b  . 

J  J-l 


(1/e‘)  2’  (>*>/«*) -"•2j*in-i/0*)  io 

QED 


If  the  experimenter  uses  Rule  BS,  we  obtain, 

™££Le."jLlL.  The  least  favorable  configuration  of  the -relevant  parameters 


2  2 

1S  0 [2]  =  9*a[l]  ’  P  =  0  ,  yielding, 


(2.6) 


inf  PCS(S.P)  =  r  z2"  (1  ♦  Z)-"dz 

n  i/e*  (r(|j 


Proofs  If  P  =  ±1  ,  we  have  PCS(5,p)  =  l  .  Indeed,  in  this  case, 

X2a  “  M2  =  b(Xla  "  V  a,e>  d  1  «  <  N)  , 
where  b  =  pa2/a  •  Hence, 


J,  (X2«  -  V2  =  b\ 


2  an  a’e- 


resulting  in 
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Bechhofer  and  Sobel  (1954)  provide  a  table  of  values  of  the 
integral  on  the  right-hand  side  of  (2.6).  For  0*  and  P*  specified, 
the  experimenter  uses  the  table  to  determine  N  =  n  +  1  . 

Lemma  2.4.  Consider  a  loss  function  L^(d,p)  =  loss  when  component 

i  is  selected  and  (o,p)  are  the  parameters,  such  that, 

2  2 

(i)  Li(o,p)  £Lj(cr,p)  when  a  >  ; 

2  2  2  2 

(ii)  0  £  L^a.P)  =  L^.fTO.p)  ,  where  "(o^c^)  =  (ira^ira  )  is  any 

2  2 

permutation  of  (a^.o^)  • 

Then  Rule  BS  is  minimax  and  admissible,  uniformly  minimizing 
the  risk  function  among  all  invariant  (under  permutations  of  compo¬ 
nents)  procedures. 

Proof.  Since  c^ (p)  >_ 0  for  all  p  ,  and  since  the  gamma  densities 

appearing  in  (2.3)  have  monotone  likelihood  ratio,  invoking  a  result 
of  Eaton  (1967a)  (a  generalization  of  a  theorem  of  Bahadur  and  Good¬ 
man  (1952)),  the  conclusion  follows  at  once.  QED 


29 


If  the  experimenter  uses  Rule  GS,  we  have, 

Theorem  2.2.  The  least  favorable  configuration  of  the  relevant 
parameters  is 


a 


2 

[1] 


p  =  0  ,  yielding, 


(2.7)  inf  PCS(5,p)  =  /“  z2  (1  +  z)"ndz  . 

i/d*  (r(|)r 


Proof.  The  proof  parallels  that  of  Theorem  2.1. 

Lemma  2.5.  If  S  denotes  the  size  of  the  selected  subset  when 
Rule  GS  is  employed,  then 

(a)  E(Sli.P) 

where  v^  and  are  both  distributed  as  in  (2.4). 

^  2  2 

(b)  sup  E(S|o,p)  =  2  ,  when  o  =  a  ,  p  =  1  . 

o,p  1  z 


Proof.  The  result  follows  easily  from  previous  developments. 


2.3.  Case  k  >_  3  . 

In  this  section  we  develop  some  preliminary  results  for  the 
case  k  >_  3  ,  and  outline  some  o;  the  difficulties  encountered.  We 
have  not  been  able  to  obtain  definitive  general  small-sample  results 
when  k  3  .  Unfortunately,  the  method  employed  for  k  =  2  in 
Section  2.2  fails  here.  In  particular,  it  is  easy  to  develop  similar 
results  to  those  given  as  Lemmas  2.1  and  2.2,  but  there  is  very  strong 
evidence  that  the  least  favorable  configuration  of  R  depends  on  N 
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and  0*(or  d*).  This  will  be  seen  in  Section  2.5,  where  we  develop 
complete  asymptotic  (N  -*•  °°)  results. 

If  Rule  BS  is  used,  without  loss  of  generality,  we  assume 
2  2 

*  1  »  o  =  9*  ,  j  t  1  ,  since  this  is  a  least  favorable  con¬ 
figuration  of  the  variances.  Indeed,  in  ft  ,  we  have, 

PCS  =  P(an  <  a..,j/l)  =  P(oJxJ(l)  <  OjX^O)) 

L  P(x^Ci)  < 


2 

where  Xn C 3 )  (1  1  j  £k)  are  the  diagonal  elements  of  a  Wishart 

matrix  with  mean  nR  . 

We  define 


where 


1  E,  „ 

f 

a 

A 

12 

.  A  = 

ii 

12 

9 

A 

A 

l  21  22j 

L  21 

22  J 

Z22  and  A22  are  (k-1)  *  (k-1) 


=  l  (xct-x)(xa-x)t  , 


symmetric  positive  definite 


matrices.  Then,  the  following  lemma  is  stated  in  a  slightly  dif¬ 
ferent  form  in  Johnson  and  Kotz  (1972),  p.  223.  It  provides  a  con¬ 
venient  representation  for  the  distribution  function  of  the  diagonal 
elements  of  A, 


Lemma  2.6.  The  conditional  distribution  of  a^  given  A22  is 
2 

noncentral  ^  »  with  noncentrality  parameter 


X 


E12!:22A22E22E21 


3] 


In  other  woras. 


00  -  \  j 

(•)■  le-jh p2  (•> 

j-0  J-  X2 


If  p,  (•)  denotes  the  density  of  the  Wishart  matrix 
A22 


A,,,,  ,  we  have 


(2.8)  PCS  =  /  P(a  <  a  , jj«l|A  )p  (W)dW 
W>0  JJ  22 


mm  a .  . 

,31*1 


e~xxk 

w>0  A2 2  k*0  k!  0 


-  /  Pa  W  l 


p  (u)  du  dW  , 
Vi 


where  W  >  0  means  W  symmetric  positive  definite. 

Using  (2.8),  a  tedious  but  straightforward  computation  shows 

that 


^-=0  (i  /  j),  at  R  Ik  • 
ij 

One  might  conjecture,  in  view  of  this  last  result,  and  the  results 
of  the  previous  section,  that  R  =  1^  is  a  least  favorable  configura¬ 
tion  of  R  .  However,  we  dot  believe  this  to  be  the  case  for  k  >  2  . 

In  fact,  we  shall  prove  in  Section  2.5,  using  asymptotic  (N  ->•  <*>)  distri¬ 
bution  theory,  that  R  =  1^  can  be  a  saddle-point  of  the  PCS  .  It 

1/2 

approaches  a  global  minimum  when  cQ*(n)  =  (l/2)n  log  0*  . 


When  the 
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experimenter  knows  that  the  off-diagonal  elements  of  R  are  equal, 
then  we  show  in  Section  2.5,  using  asymptotic  theory,  that  R  =  1^ 

is  a  least  favorable  configuration,  which  does  not  depend  on  N 
and  9*  .  In  other  words,  we  are  facing  a  situation  similar  to  the 
one  encountered  in  Chapter  1,  where  the  least  favorable  configuration 
varies  with  the  sample  size. 

The  same  remarks  are  valid  when  Rule  GS  is  used. 

2.4.  A  conservative  approximation  to  the  sample  size 

In  view  of  the  difficulty  of  determining  a  least  favorable 
configuration  of  R  for  k  >,  3  ,  the  following  Bonferroni  approxi¬ 
mation  (cf.  Section  1.6  of  Chapter  1)  can  be  used  to  determine  a  value 
of  N  ,  which  will  be  larger  than  the  minimum  N  required  to  guarantee 
the  probability  requirement. 

Lemma  2.7.  If  Rule  BS  is  used, 

-1 

(2.9)  inf  PCS (5, R)  >  (k-1)  P  F(?  ,  z 2  (l+z)"ndz  -  (k  -  2)  . 

n  1/e*  (r(|)r 

Hence  Bechhofer  and  Sobel's  (1954)  table  may  be  used  to 
determine  a  conservative  value  of  N  *  n  +  1  . 

If  Rule  GS  is  used,  a  similar  approximation  is  available, 
replacing  9*  by  d  in  (2.9). 

2.5.  Large-sample  theory 

In  this  section  we  develop  a  large-sample  theory  for  the 
problems  considered  in  Section  2.1.  One  of  the  results  obtained 
(Theorem  2.3)  will  be  used  in  the  next  two  chapters  as  an  important 
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tool  for  obtaining  large-sample  results.  We  start  with  a  version  of 

the  Central  Limit  Theorem,  stated  and  proved  in  Anderson  (1958),  p.  75. 

Lemma  2.8.  Let  X  ,  a  =  1 , 2 , . .  .  ,  be  a  sequence  of  independent 

k-dimensional  normal  vectors,  each  with  mean  vector  u  and  covariance 

matrix  Z  =  (o. .)  .  Let 
ij 


B(n)  =  (b  (n))  =  n2{  (1/n)  jj  (Xa  -  -  XN)t  -  z]  , 

J  ot=l 


where 


N 

l  X  /N  ,  n  =  N 
i  a 
a=l 


1  . 


Then  the  asymptotic  (N  »)  distribution  (a.d.)  of  B(n) 
if  multivariate  normal,  with  zero  means,  and  covariances 


E(b. .(n) 
ij 


bu(n))  ■ 


a.,  0. . 
lk 


a.  „a 

i2.  jk 


Another  tool  that  will  be  used  extensively,  is  given  below  as 
a  lemma,  the  proof  of  which  may  be  found,  for  example,  in  Rao  (1968), 
Chapter  6. 

Lemma  2.9.  Let  (Y^,  . . .  .Y^)  ,  n  =  1, 2, . . .  ,  be  a  sequence  of  not 
necessarily  independent  vector  variates,  such  that, 


1/2 

n 


has  multivariate  normal  asymptotic  distribution  with  zero  means  and 
covariance  matrix  Z  .  Let  gj,...,gr  be  real  functions  defined  on 
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F,  ,  the  k  dimensional  Euclidean  space,  which  are  differentiable  in 
a  neighborhood  of  0°  =  (0° , . . . ,o°)  .  Then  the  a.d.  of 

J  j  *y  ^  ^ 

11  ^8i  ^Yln’  ‘  ‘ '  ,Ykn^  (1  £  j  <  r)  ,  is  multivariate 

normal,  with  zero  means,  and  covariance  matrix, 

^e°)  -  (Vtgirvg.)  , 

where 


0  £  j  £  r)  , 

if  £(0W)  is  nonsingular. 

We  shall  use  the  notation  introduced  in  Section  2.1,  and  assume, 

2  2 

without  loss  of  generality,  that  <  aj  (j  f  l)  . 

Lemma  2.10.  The  a.d.  of 


'  8j  -  (• 


Mi 


30. 


0gj. 

30,  - 


1/2 f log(aii/a.  J  ~log{o2/a2) 

c.,0)  V.  =  n1/2{ - tl-u..  2  ,}2M  (j 


2U  -P  )■ 


i  t  i)  , 


is  standard  multivariate  normal,  with  correlations. 


corr(Yi,Yj)  E  y.. 


,222 

l-p, , -p. .+p. . 

li  Mi] 


2Cl-p^.)1/2(l-p2.)1/2 


(i  t  j) 


Proof.  From  Lemma  2.8,  the  a.d.  of 


1/2  2 

n  (a^/n  -op  (1  <  i  <  k)  , 


mm 
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is  multivariate  normal  with  zero  means,  variances  equal  to 

■> 


* 


ami  covariances 

t  ran s format  ion 
have  that 


2nTj  .  Therefore,  using  a  variance  stabilizing 


(cf.  Bartlett  and  Kendall  (1946)  and  Lemma  2.9),  we 


1/2  2 
n  '  (log(a. ,/n)  -  log  (1  <  i  <  k)  , 

has  a  multivariate  normal  a.d.  with  zero  means,  variances  equal  to  2  , 

2 

and  covariances  equal  to  2p^  .  Finally,  (2.10)  is  obtained  using 
Lemma  2.9  once  again.  QED 

The  proof  of  the  above  lemma  is  essentially  contained  in 
Bamberg  (1969).  Note  that  (2.10)  resembles  the  distributions  of  the 
previous  chapter  (cf.  Lemma  1.1). 

Let  PCSa  denote  probability  of  a  correct  selection  when 

an  asymptotic  (N  -*■  °°)  distribution  function  is  used.  The  following 
is  an  important  result  for  our  purposes. 

Theorem  2.3.  If  the  experimenter  uses  Rule  BS,  the  asymptotic 

(N  •+■  °°)  least  favorable  configuration  of  the  relevant  parameters  is 

9*°[1]  =  °[2]  =  =  a[k]  *  pij  =  0  (i  *  • 


Therefore, 


(2.11)  inf  PCS  (o,R)  =  P(Y.  <  n1/2(l/2) log  0*  ,  j  t  1)  , 

n  a  j 

where  the  (Y^.j  ^1}  are  distributed  as  in  (2.10)  with 
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Proof.  In  SI  ,  if  o2  <  a 2  (j  ^  1)  , 

1/2.  ,  2,  2. 
n  log(a./a  ) 

PCS(a.R)  =  P(a  <  a  j  +  I)  =  P(Y  <  - ^  1  ,  j/l) 

JJ  J  2C1-Pj .) 1/2 

il’OTj  <  n1/“(l/2)(log  0*)(1  -  P^)'172  ,  j  /  1) 


*'  ^V(n)fl  -  P[21"1/2 . co*  Cn)  ( 1  -  Pjkr1/2) 


(Y. .r  o< 

i  j 


'0V 


wnero , 


cQ*(n)  =  n1/2(l/2)  log  0*  , 


and 


$(Yij3  =  Vij)(C2Cn),',‘,Cl<(n)) 
c2(n)  ck(n) 

=  L  •*•_£  f(Yi  j)  (y2 . yk1dy2-*dyk  ’ 

c^n)  =  c0*(n)(I  -  P2j)'1/2  (j  /  1) 

3nd  f(Y.  .)(y2’”',yk)  is  the  P-d-f-  of  the  {Yj  ,  j  /  1}  .  Since, 


3p 


tTiii 


9$ 


=  2p 


k* 


kl  an2 

*Pk* 


if  a11  Pk£  =  0  »  it  Allows  that  the  correlation  matrix  R  =  I  is  a 
stationary  point  of  4>  .  .  We  must  show  that  it  is  a  point  of 

^ '  i  j 

global  minimum  as  c  (n)  -+  °°  . 

Q* 
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First  assume  /I  (j  j*  1)  .  We  will  prove  that 


34> 


(Yij) 

-y-2 —  >  0  (£,m  >  1)  . 


3p 


£m 


Without  loss  of  generality,  consider  1  =  2,  m  =  3  . 


3Vij)  8Y23  c4  (n)  ck(n) 

(2.12)  - y2-  "~~T  \  ••■/  ffY  Jc2(n),c  (n),y  ,  ...,y 

3P23  3P23-»  --  CV  2  3  4  1 

>  0 


since 


3y 


23 


3p 


d/2)  (1  -  pJ2)“1/2(1 


^3>-1/2  >  0 


23 


Next,  we  show  that,  as  c  *(n)  °°  , 


3$ 


(2.13) 


(Yij) 

-P—  >  0 


9p 


(P  t  1) 


IP 


Without  loss  of  generality,  take  p  =  2  .  Since  p! 
in  rhe  expressions  of  Y23’'’*,Y2k  ’  We  ^ave 


)dy4...dyk 


appears 
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34>  ,  .  9$,  , 

(Y-  ;)  k  (y  ) 
(2.14)  - t±J-=  l  _ iL 


3p 


2  '  .£_  "17 

12  j~3 


2j 


jy 

2  „  2 


p“?  fixed  9p12  9p12 


y.  .  fixed 
ij 


3r 


l  -ii/C3(n!./c->-l(n)/Cj+l(n?./Ck(nf 


j«3  -°° 


(Y  )  ^2  * ^3 *  *  ‘ "  j  _  1  ’ c j  * 

”,.i . yk)dy.r--dyk 


3S(n)  c  (n)  c  (n) 

--L  ftYij)fc2(n;'y3'-"'yk)dyr--dyk 


3p .  2  -°° 


k  3y2.  3c  (n) 

=  l  _|i-M  +  — S— Q  . 

j=3  3p  3p 


where 


3c2(n) 


3p 


c0*(n)  Cl/2)  (1  -  p^2)  -*•  ®  as  c0*(n)  -*•  «  ; 


12 


(2.15) 


!Lll 


3p 


12 


2  2  2  . 

P12  P2 j  ~P1 j  ~ 1 
2  ,1/2,,  2  ,3/2 


4d-plj)  (l-pi2^ 


12j 

2TI72;,  2  ,3/2 


4(l’plj)  (l"pl2) 


Since  M  >0  (j  >_  3)  and  Q9  >  0  ,  we  only  have  to  consider 


9Y,, 


.  2i 

situations  where  — J-  <  0  for  some  j  .  Suppose,  for  instance, 


3p 


12 


that 


3Y 


23 


3p 


—  <  0  ■  77118  is  equivalent  to  X123  <  0  .  We  will  show  that 


12 


Q2  -  M23  >  0  ♦  which  proves  (2.13),  in  view  of  (2.15). 
A  straightforward  computation  leads  to 
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C,4(n)_Y24C2(n:)  Sk(n)"Y?kS(n) 
^2"M23  =  f(c?(n))  J  242.../k  2k  2 


|  ^3(n)-Y23C2(n) 


»zk)dz3-f(c3(n)-Y23c2(n) ,zA, . . 


dz  . . . .  dz, 
4  k 


where 


f(c0(n))  =  - y[2  exPf  ‘  —  1 


(2ir) 


and  f  is  the  density  of  a  multivariate  normal  distribution  with 

zero  means  and  covariance  matrix  which  depends  only  on  the  v  fi 

ij 

Since  *^3  <  0  ,  we  have 


cg * (n)  (“ ^  1 

c,(n)  -  y23c2(n) 

20-^3)  (l-p12) 


as  c0Jn) 


showing  that  Q2  “  M23  >  0  *  This*  in  turn>  implies  (2.13) 
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Equations  (2.12)  and  (2.13)  imply  that 


*(Y.  )U2(n),...,ck(n))  1  ^(i/2) (ce*(n) * •  * ' 'ce*Cn)) 


ij 


This  last  lower  bound  is  achieved  when  p_  =  0  (i  /  j)  , 

2 

proving  (2.11)  when  ^  1  (j  ^  1)  . 

2 

Let  J  =  ( jj , . . . ,  j ,  1  (J  J  .  Assume  that  =  1  ,  j  £  J 

Using  an  argument  similar  to  the  one  employed  in  the  proof  of  Theorem 
2.1,  we  have, 


an  =  a-e-  6  J) 

j  j  • 


2  2 

Therefore,  for  such  an  R  ,  if  Oj  <  (j  #  1)  , 


2,  2, 


PCS  =  P(  n  (a  <a  ))  =  P(  n  (o;<ot)  ,  n  (a <a  )) 
j>i  11  33  ,i€J  3  jiJ  33 

=  P(  n  (a  <a  .) ) 


n  jj 


Since,  for  any  R  , 


PC  n  (a„  <  a..))  >  P(  n  (a  <  a..)) 


11  jj 


j>l 


11  jj 


L 
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2 

it  is  seen  that  p^  =  1  ,  j  €  J  does  not  lead  to  the  infimum.  QED 

Theorem  2.4.  If  the  experimenter  uses  Rule  GS,  the  asymptotic 

(N  -*•  °°)  least  favorable  configuration  of  the  relevant  parameters  is 


a 


2 

1 


0  (i  /  j)  . 


Therefore , 


(2.16)  inf  PCS  (o ,  R)  =  P(y  <  n1/2(l/2) log  d*  ,  j  M)  , 
a  J 

where  the  (Y^  ,  j  ^  1}  are  as  in  Theorem  2.3. 

Proof.  The  proof  is  similar  to  the  proof  of  Theorem  2.3. 

I.emma  2.11.  If  S  denotes  the  size  of  the  selected  subset  when 
Rule  GS  is  employed,  we  have  asymptotically  (N  -*■  ®)  , 


(a)  E  (S | 5, R) 

it 


k 

l  P(Yi<n1/2(l/2)(l-pf.)'1/2log(d*o2/a2)  ,  i  i  j) 
i=l  J  J  1 


where  the  (Y*  ,  j  i  i}  are  distributed  as  in  (2.10)  with  i  in 
place  of  1  . 


(b)  sup  E  (S|a,R)  =  k  when  a2  =  a2  (i  ^  j)  ,  R  = 
a  i  j 


1  ...  1 


1  ...  1 


Proof.  Consequence  of  previous  developments. 


Theorem  2.5.  Suppose  that  p„  =  p  (i  ^  j)  ,  where  p  is  unknown. 


Then, 

(a)  If  Rule  BS  is  used,  an  asymptotic  (N  -*■  °°)  least  favorable 
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configuration  of  the  relevant  parameters  is 


P  =  0  , 


(2.11)  being  pertinent; 

(b)  If  Rule  GS  is  used,  an  asymptotic  (N  ->.  ®)  least  favorable 
configuration  of  the  relevant  parameters  is 


2 

a. 


P  =  0  , 


(2 . 16)  being  true . 

Proof.  This  follows  directly  from  Lemma  2.10,  without  the  need  for 
further  arguments. 

There  is  evidence  that  the  approximation  used  in  Lemma  2.10 
is  very  good,  even  for  small  values  of  N  .  The  reader  may  consult 
Bechhofer  and  Sobel's  (1954)  tables  where  some  comparisons  are  given. 
Hence,  we  would  conjecture  that  Theorem  2.5  provides  an  excellent 
approximation  to  N  ,  even  for  relatively  small  values  of  N  .  As 
for  Theorems  2.3  and  2.4,  we  have  used  the  fact  that  N  is  large 
in  a  stronger  manner,  but  still  it  is  expected  that  moderate  values  of 
N  would  provide  a  very  good  approximation  to  the  small  sample 
results.  We  would  expect  that  the  approximation  will  be  an  excellent 
one  if  P*  and  6*  are  close  to  unity.  Values  of  N  may  be  deter¬ 
mined  using  formulae  (2.11)  and  (2.16)  in  conjunction  with  the  tables 
of  Gupta  (1963)  or  Milton  (1963). 


We  next  explore  the  behavior  of 


(V 


in  the  vicinity  of 


It  is  easy  to  compute,  from  (2.12)  and  (2.14),  that  at 
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34> 


<Iu! 


;»p 


0  (£>m  >  1) 


l!=I. 


<Yiil 


,  2 
’Plj 


R=r. 


(k-2)  v(n( 

4  1  "'1  f(i/'2)(ce*(n)-V<",>y<f"’yk)dV 


ce*(n)  c,9*(n>  ?8*fn^ 


2  l  •"/  fCl/2)(C8.(n)-y3 . yk)dy3---dyk 


(j  >  1)  . 


Therefore, 


3* 


3pii 


<  °  if  cQ*(n)  =  0  , 


R=I. 


and 


>  0  if  c „*(n)  ®  . 


In  other  words,  if  cQ*(n)  is  small  enough,  j  has  a  saddle- 

point  at  R  =  Ik  ,  while  as  c0*(n)  increases  it  will  have  a  local 

minimum  there,  and  eventually  a  global  minimum. 

Finally,  we  show  that  PCSo  can  be  less  than  1/k  ,  as  was 

a 

also  the  case  in  Chapter  1.  Note  that  1/k  is  the  lowest  possible 
value  for  the  PCS  when  R  =  I  .  Take  k  =  2 
2  2, 

P12  ~  P13  =  1//2  »  p23  =  °  *  T^ien»  y 23  =  0  and 


°,2(n)  C,3(n) 

PCSa  =  /  /  f(0) fy2,y3)dy2dy3  *  1/4  <  1/3 


if  cQ*(n)  =  0  ■ 


CHAPTER  3 


SELECTION  OF  A  SUBCLASS  OF  VARIATES  WITH  THE  SMALLEST  POPULATION 
GENERALIZED  VARIANCE  FROM  A  SINGLE  MULTIVARIATE  NORMAL 
POPULATION  (ASYMPTOTIC  THEORY) 

3.0.  Introduction 

In  this  chapter  we  study  selection  procedures  in  terms  of 
population  generalized  variances  associated  with  subclasses  of  variates 
from  a  single  multivariate  normal  population.  In  Section  3.1  we 
consider  disjoint  subclasses  and  the  results  obtained  are  extensions 
of  the  results  of  Section  2.5  of  Chapter  2.  In  Section  3.2  we  consider 
intersecting  subclasses.  Many  other  selection  problems  in  terms  of 
generalized  variances  may  be  treated  using  the  ideas  of  the  present 
chapter.  We  decided  to  restrict  consideration  to  these  two  particular 
problems,  since  they  illustrate  well  the  methods  we  propose.  For 
instance,  it  is  easy  to  extend  these  results  to  selection  problems 
involving  subclasses  of  different  sizes.  Throughout  the  entire 
chapter,  the  theory  developed  is  asymptotic  (large-sample),  and  could 
be  stated  in  a  more  general  framework  than  normality. 

3.1.  Selecting  the  smallest  population  generalized  variance  (disjoint 
subclasses)  . 

Consider  a  kp-variate  normal  population,  with  unknown  popu¬ 
lation  mean  vector  and  unknown  population  covariance  matrix, 
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v  T 

i  Liz  • 
*2\  *2  •• 


5'kl  5k  2 


wlicro  5’.  (1  <  j  k)  arc  p  *  p  symmetric  positive  definite 

matrices.  The  quantity  det  E^  is  referred  to  as  the  population 

generaliced  variance  associated  with  the  ith  subclass  ol‘  variates 

(•  <  i  *  k)  .  Let  det  E  .  ,  <  ...  <  det  E  r,  ,  be  the  ranked  values 

[1]  -  -  [k] 

of  the  det  E^  (1  <  i  ^  k)  .  It  is  assumed  that  no  prior  knowledge 
exists  concerning  the  values  of  det  E^  (1  1  j  1  k)  ,  or  of  the 
pairing  of  det  E^^  with  the  subclasses  of  variates. 


Indi  f ferencc- zone  formulation 

The  experimenter's  goal  is  to  select  the  subclass  of  variates 
associated  with  det  E^j  .  lie  specifies  {6*,P*}  ,  0*  >  1  , 

1/k  <  P*  <  1  ,  prior  to  the  start  of  experimentation.  If  PCS^(E) 
denotes  the  probability  of  a  correct  selection  when  decision  procedure 
R  is  employed,  we  restrict  consideration  to  procedures  R  which 
satisfy  the  probability  requirement: 


inf  PCS„(E)  >  P*  , 

n  R  - 


where 


II  =  { Z  |  det  E  .  det  E^  ,  j  ?  [1]}  . 


In  this  chapter  we  propose  ''natural"  single-stage  selection 


•If) 


procedures,  which  associate  sample  quantities  with  the  corresponding 
population  parameters. 

A  sample  of  N  independent  kp-vector  observations, 

X*  =  (Xja,...,X^  u)  (1  a  IN)  ,  is  taken,  and  the  sufficient 

N  N 

statistics  (X^S)  ,  XN  =  l  Xa/N  ,  S  =  I  (Xa  -  X^)  (Xa  -  X^  /n 

a=l  a=l 

n  =  N  -  1  ,  are  obtained.  Let  S  be  partitioned  according  to  X  , 
in  such  a  way  that  corresponds  to  E^  (1  <  j  <  k)  ,  and 

S^.  to  £  (i  /  j)  .  For  ^his  indifference -zone  goal,  we  adopt 

the  following  decision  rule: 

Rule  Rqyj :  Assert  that  the  subclass  associated  with 

det  t  min  det  ,  has  the  smallest  population  generalized 

variance,  det  E^  . 

Our  objective  is  to  determine  the  smallest  sample  size  N 
such  that  R^^  will  guarantee  the  probability  requirement. 

When  E  =  E*  (i  t  j)  ,  is  minimax,  and  also  has 

uniformly  smallest  risk  for  a  class  of  natural  (invariant)  decision 
procedures  and  loss  functions  (cf.  Eaton  (1967b)). 


Subset  formulation 

If  the  experimenter  wishes  to  select  a  subset  of  subclasses 
containing  the  subclass  associated  with  det  E^  ,  he  specified 

{ P* }  ,  1/k  <  P*  <  1  ,  prior  to  the  start  of  experimentation.  If 

PCS^(E)  has  the  same  meaning  as  above,  we  restrict  consideration 
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to  decision  procedures  R  which  satisfy  the  probability  requirement: 

inf  res  (>:)  >  r*  . 

Z  * 

We  propose  the  following  decision  procedure  for  this  subset 

goa  1 : 

Rule  Include  the  subclass  of  variates  associated  with 

S.  in  the  selected  subset  if  det  S.  <  d*  det  S,,,  ,  where  d*  >  1 
J  J  ~  [1] 

is  a  specified  constant. 

Our  objective  is  to  find  the  smallest  sample  size  N  which 

will  guarantee  the  probability  requirement  when  R  is  employed. 

bv  z 

Gnanadesikan  and  Gupta  (1970)  studied  R  for  the  case 

bV  Z 

where  Z.  .  =  0  (i  ^  j)  . 
ij  J 

We  disregard  the  population  means  in  what  follows,  since  they 
arc  irrelevant  in  our  problems.  We  assume,  without  loss  of  generality, 
that  det  Z^  <  det  Z^  (j  ?  1)  . 

The  following  linearization  result,  proved  in  Siotani  and 
llayakawa  (1964),  and  which  goes  back  to  Olkin  and  Siotani  (1964), 
will  be  used  extensively  in  this  and  the  following  chapter. 

Lemma  3.1.  Let  S  be  as  above,  Z  =  (°ag)  ,  and  f j (S)  ,  j  €  J  , 

.1  a  finite  set,  be  real  valued  functions  of  S  ,  not  algebraically 
dependent,  having  first  and  second  derivatives  in  a  neighborhood  of 

Z  (in  the  topology  inherited  from  E  )  .  Then,  the  a.d.  of 

n1/2(f.(S)  -  f.(Z))  (j  €  J) 
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is  multivariate  normal  with  zero  means,  variances  equal  to 
2  ? 

2(fj (E))  tr  (♦(£)£)  Cj  6  J)  ,  and  covariances  equal  to 
2fj(E)f\(£)  tr  <f>  j  (E)  (E)  E  (i,j  £  J)  ,  where 


(Vs(£))  ■  WC>  ’  V°8  fj(I)  ■ 


af3 


=  (1/2)  (1  +  6  a) 


otg  3 o 


ag 


a$ 


=  1  if  a  =  g  ,  =  0  if  a  /  6. 


Lemma  3.2.  The  a.d.  of 


n1/2 


(det 


det  I1 


.  ,det  -  det  E^) 


is  multivariate  normal,  with  zero  means,  variances  equal  to 
2 

2p(det  E  )  ,  and  covariances  2  det  E.  det  E.  tr  e71£  E*JE 

J  1  J  i  ij  j  ji 

1>roof-  This  lemma  is  a  consequence  of  Lemma  3.1.  Using  the  notation 

peculiar  to  that  lemma,  let  f . (E)  =  det  Z  .  Then,  it  is  known 

(cf.  Anderson  (1958),  p.  347)  that. 


V1 

0 

0 

0 

0 

cf 

♦j(Z)  = 

0 

0 

0 

-e- 

NJ 
/ — \ 

II 

0 

E"1 

2 

0 

-  0 

0 

0. 

p 

0 

0. 

Noticing  that 
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tr(^(5:)E)Z  =  tr 


I  E_1E 
p  1  12 


ZI,Z!k 


0 

Lo 


o 

0 


0 

0  J 


tr  ♦1(I)I*2(I)Z  =  tr 


I  Z^Z 


p  1  12 

0  0 


Z_1E 
1  lk 


0 


0 

0  J 


0  0 


0 


1 

2  21  p 


E'V, 

2  2k 


L  0  0 


0 


=  tr 


the  present  lemma  follows  at  once. 
Hooper  (1959)  defined 


QED 


2 

PH  =  (Vp)  tr  sT  1e  zT1z . .  , 
ij  i  i]  iji 


as  the  squared  trace  correlation  coefficient  between  subclasses 


i  and  j  .  If  v2 v2  . 


li J  *  *  *  * » vpi  j  are  t^ie  can°nical  correlations  (cf. 


Anderson  (1958))  between  the  two  subclasses,  it  can  be  shown  that 


2  ?  2 

i  i  ^  V£ij/P  * 


1J 


1=1 


wnich  implies 


0  <  p..  <  1 

-  ij  - 


Lemma  3.3.  The  a.d.  of 


,T  1  1/2 r  lo8(det  S  /det  S  )-log(det  E/det  Z.) 

(3.1)  Y.  =  n  j  - — _ J  _ i _ ]_  l 

J  1  7:1/2;.  2  .1/2  / 


2p1/2(l-p2.)1/2 


(j  t  l)  , 
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is  standard  multivariate  normal,  with 


corr(Y  j,Y !) 


Yij 


,  2  2  2 
1_Pli’PJi^ii 


2 — 172  — 21/2  *  j) 

2Cl-pli')  fl-Pij)  7 


£roof^  This  follows  using  Lemma  2.9  of  Chapter  2  and  Lemma  3.2  above. 
Theorem  3.1.  If  the  experimenter  uses  Rule  R  ,  an  asymptotic 

(N  -*■  00 )  least  favorable  configuration  of  the  relevant  parameters 
is 


det  Z./det  Zi  =  0*  (j  *  1)  ,  z  =o  (i  t  j)  . 

Therefore , 

(3.2)  inf  PCS  (E)  =  P(Y|  1  n1/2(l/2)p'1/2log  0*  ,  j  +  1) 

rib  J 

where  the  (y|  ,  j  /  1}  are  distributed  as  in  (3.1)  with 
Yij  =  1/2  (i  t  j)  . 

•  Using  Lemma  3.3,  we  have  for  det  Z ^  >_  0*  det  E^  (j  /  1)  , 

PCS  (E)  =  P(det  S  <  det  S.  ,  j  /  1) 
a  i  J 

=  P(Y.  <  n1/2(l/2)p-]/2(l-p2.)-1/2iog(det  Z./det  Z})  ,  j  /  !) 

>  P(Yj  <  n1/2(l/2)p"1/2(l-p2.)-1/2iog  0*  ,  j  *  1)  . 

We  can  now  use  Theorem  2.3  of  Chapter  2,  and  set  p2j  =0  (i  /  j) 

to  obtain  a  lower  bound  on  PCSa(E)  .  Since  the  parameter  configuration 
det  Ej  =  9*  det  Ej  (j  /  1)  , 


Sij  =0  (i  /  j)  ,  leads  to  this 
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lower  bound,  it  is  a  least  favorable  configuration. 

It  is  easy  to  show  that  E  =0  (i  t  j)  ,  is  also  necessary 

for  an  asymptotic  least  favorable  configuration.  Indeed,  if 

pt\  =  0  ,  then  =  0  necessarily,  from  the  definition  of  squared 

trace  correlation  coefficient.  QED 

Theorem  3.2.  If  the  experimenter  uses  Rule  R„,,„,  an  asymptotic 

uV  z 

(N  ->■  °° )  least  favorable  configuration  of  the  relevant  parameters 
is  E  =  ...  =  E  ,  E..=0  (i  /  j)  .  Therefore, 

(3.3)  inf  PCS  (E)  =  P{y)  <  n1/2(l/2)p"1/2log  d*  ,  j  *  1) 

E  d  J 

where  the  lyj  ,  j  t  1}  are  as  in  Theorem  3.1. 

Proof.  The  result  follows  immediately  from  Theorem  3.1. 

1  .omnia  3.4.  If  S  denotes  the  size  of  the  selected  subset  of  sub¬ 
classes  of  variates  when  R  _  is  employed,  we  have 

uV  Z 


(a)  Ea(S|E)  =  l  P(Y'  <  n1/2(l/2)p"1/2(l-p^)'1/2log(d*det  E./det  E.)  , 


i=l 


i  y 


i  t  j) 


where  the  ,  i  i  j}  are  distributed  as  in  (3.1)  with  1  replaced 

by  i  . 


(b)  sup  E  (S|E)  =  k  which  occurs  when  E  = 
E  a 


I  ...  I 
P  P 


I  ...  I 
l  P  V) 


Proof.  The  result  is  a  consequence  of  previous  developments. 


52 


It  is  interesting  to  note  that  Theorems  3.1  and  3.2,  and  Lemma 
3.4  reduce  to  Theorems  2.3  and  2.4,  and  Lemma  2.11  of  Chapter  2, 
respectively,  when  p  =  1  . 


3.2.  Selecting  the  smallest  population  generalized  variance 
( intersecting  subclasses) . 

The  last  problem  of  the  present  chapter  is  a  problem  of 
intersecting  subclasses  of  variates,  where  some  variates  belong  to 
more  that  a  single  subclass.  We  start  by  proving  a  lemma,  which  will 


be  basic  to  what 

follows . 

lemma 

3.5.  Let 

Xa 

-  (*!«■ 

,x!  ,X*  ) 

2a  3a' 

(1  £_  a  <_  N)  ,  be  independent 

normally  distributed 

(Pl+P2+P3) -vectors ,  with  unknown  population 

means 

and  unknown  population  covariance  matrix 

A  D  h 

Z  = 

B  F 

t  „t 

[e  f  cj 

where 

A(px  *  P:) 

J 

b(p2  *  p2)  and 

C(p3  x  p3)  .  Define, 

X* 

12, a 

’  (X!a’X2a) 

) 

4 

23, a 

’  tX2„>4>  "i*-"1' 

N 

N 

X,  n 

=  I  x„ 

/N 

,  ¥  , 

=  y  x__ 

/N  , 

12 

L,  12, a 
a=l 

’  23 

L.  23, 
a=l 

,a  ’ 

12 


\  (X12,a  "  X12^  fX12,a  ”12 
a=l  ’ 


X,Jt/n  ,  n  =  N  -  1  , 


N 


=  l  (X,*  „  -  X„)(X, 
a=l 


23  L,  '“23, a  ”23'v”23,a  ”23 


X  )  t/n 


"12 


A  D> 

,  E  = 

f 

B 

F' 

D*  B 

’  23 

c 
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Then,  the  a.d.  of 


1/2 

n  (det  Sp  -  det  E 


,  det  S 


23 


-  det  E  ) 
2Y 


is  multivariate  normal,  with  zero  means,  variances  equal  to 
-(P|  P^HdetZ^j  ,ind  2(p^+p^)  (detH^^)  ^  ,  and  covariance 
2(p2*A)dot  X12det  J:2j  ,  where  A  >  0  is  defined  in  the  course  of  the 


proof. 

P-r-00f-  Usin8  t,le  notation  of  Lemma  3.1,  we  obtain, 
H  I  i  <  j  1  3)  , 


f- i  (E)  =  det  E .  . 
ij  ij 


12 


(2) 


0 

0  ] 

’  t23(E)  - 

0 

1 

►— * 

o" 

o 

The  expressions  for  the  variances  follow  immediately.  Now  we  define. 


z"1  - 

(A-DB~ 1Dt) _1  -(A-DB_1Dt)"1DB"1' 

12 

z-1  = 

23  * 

_Z>>_ 

= 

E23(u> 

_-(C-FtB"1F)"1FtB"1  (C-FtB“1F)  “  i. 

In  order  to  compute  the  covariance  we  must  evaluate 
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tr  ♦12(EK*23(ni: 

-ii  0 

0 

0  0  0 


tr 


Z 


12 


E 

t  t  F 
E  F1  C 


ooo 

0 

.-1 


0  z 


=  tr 


i  o  2:“ 1  ru)  fE) 

Pj  12L 

0  I  £?!(*)  fE) 

F2  12  ^  ‘  VF; 


0 


0 


23 

0 


23v  ^ 


ADI! 

t 


I) 


£  . 

t  z3 


0 

0 

[DV 

I 

P2 

(e'j 

0 

($ 

0 

I 

lF-  J 

P3 

'  p2  *  tr 

=  P2  +  tr(A-DB'1Dt)‘1(E-DB-1F)(C-FtB‘1F)'1(Et-FtB-1Dt) 

:  p2  +  A  . 

Hooper  (1962)  defined 


P 1 3 . 2  =  VP]  ' 

as  the  squared  partial  trace  correlation  coefficient  between  subclasses 

X1  and  X3  conditional  on  X2  .  If  Pj  <  (say)  and 
2  2 

v 1’ ‘ ’ vp  are  the  canonical  correlations  between  X  and  X 
1  1  1  3 

X1  and  X3  given  X2  ,  it  can 


2, 

V? i  - 

which  implies, 


in  the  conditional  distribution  o 


be  shown  that, 


1 


13.2 


P 

=  I 
1=1 
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Hence,  0  ±  A  £  p^  .  QED 

We  note  in  passing  that  A  =  0  if  D  ,  E  and  F  are  zero 
matrices. 

In  this  section  we  consider  X*  *  (X^,...,X^)  a  k-variate 

normal  population,  with  unknown  population  mean  vector,  and  unknown 
population  covariance  matrix  E .  We  assume  k  3  .  Consider  all 
possible  subclasses  of  specified  size  t  (t  <  k)  of  X  ,  whose 

total  number  is  U*(tJ  .  Let  E  .....E^  be  the  covariance  matrices 

(submatrices  of  E  )  corresponding  to  these  U  subclasses,  and  let 

det  E ,  .  <  ...  <  det  Eril  be  the  ranked  values  of  det  E.  (1  <  j  <  U)  . 

[1]  —  —  [U]  J  -  “ 

It  is  assumed  that  the  experimenter  has  no  prior  knowledge  concerning 

the  values  of  det  E^  (1  j  _<  U)  ,  or  of  the  pairing  of  the 

det  Er.,  with  the  subclasses  of  variates. 

[i] 

Indifference-zone  formulation 

The  experimenter's  goal  is  to  select  a  subclass  of  t  variates 
out  of  the  k-variate  population,  with  the  smallest  population  genera¬ 
lized  variance,  det  E^  .  He  specifies  { 0* , P* >  ,  0*  >  1  , 

1/U  <  P*  <  1  ,  before  experimentation  starts.  If  PCS^(E)  is  as 

defined  in  Section  3.1,  we  restrict  consideration  to  decision  procedures 
R  which  guarantee  the  probability  requirement: 

inf  PCS„(E)  >  P* 

n  K 
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where 


il  =  {E 1 8*det  E^j  <  det  l  ,  j  f  fl] }  . 


We  propose  the  following  decision  procedure  for  this  indi f- 
ference- zone  formulation  of  the  problem: 

Rule  Let  S  be  the  sample  covariance  matrix  computed 


using  a  sample  of  size  N  from  the  above  population,  as  in  Section 
3.1.  Let  (1  £  i  £  U)  be  submatrices  of  S  corresponding  to 

E^  (1  <_  i  <_  U)  .  Then  assert  that  the  subclass  of  t  variates 


associated  with  det  Sril  =  min  det  S.  has  the  smallest  population 

[1]  j 


generalized  variance,  det  E 


[1] 


Our  task  is  to  determine  N  which  will  guarantee  the  proba¬ 


bility  requirement  when  R^.,,  is  used. 

uv  j 


Subset  formulation 

If  the  experimenter  is  interested  in  selecting  a  subset  of 
subclasses  of  variates,  which  includes  the  subclass  associated  with 
det  ,  he  must  specify  { P* }  ,  1/U  <  P*  <  1  ,  before  experimen¬ 

tation  starts.  If  PCS^(E)  has  the  same  meaning  as  above,  we 

restrict  consideration  to  decision  procedures  which  guarantee  the 
probability  requirement: 

inf  PCS  (E)  >  P*  . 

E  R 

For  this  subset  formulation,  we  propose, 
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Rule  R, 


GV4 


Include  the  subclass  associated  with  S. 

J 


(1  £  j  <  U)  ,  in  the  selected  subset  of  subclasses  of  variates  if 
det  S.  £  d*det  »  where  d*  >  1  is  a  specified  constant. 

Our  task  is  to  determine  the  smallest  sample  size  N  which 

will  guarantee  the  probability  requirement  when  R  ,  is  employed. 

uv  4 


Due  to  the  symmetry  of  the  present  problem,  we  may  assume, 


without  loss  of  generality,  that  det  E^  =  det  E^  .  Moreover,  we 


give  no  consideration  to  population  means  in  what  follows,  since 
they  are  irrelevant  for  our  problems. 

Lemma  3.6.  The  a.d.  of 


n1/2(det  S, 


-  det  E 


l1 


det  -  det  E^) 


is  multivariate  normal  with  zero  means,  variances  equal  to 
2 

2t(det  E.)  (1  <  j  <  U)  ,  and  covariances  2(t.  .  +  X.  .)det  E.  det  E. 

J  -  -  iJ  iy  i  1 

(1  1  i  <  J  5.  U)  ,  where  A  _  >_  0  (defined  in  Lemma  3.5),  and 

t..  is  the  number  of  common  variates  of  subclasses  i  and  j 
ij 

(corresponding  to  and  ) . 

Proof.  This  result  is  a  consequence  of  Lemma  3.5. 

We  define, 


P2.  =  (t..  +  A..)/t  (i  t  j)  ,  t../t  <  p2.  <  1  . 

ij  ij  ij  ij  ~  lj  - 


Lemma  3.7.  The  a.d.  of 


(3.4)  y 


1  1/2  loS(det  Sj/det  S.)-log(det  E^/det  Z.) 

i  =  n  {  - 1  h  i — rr> - 


2t1/2(l-p2j)l/2 


^  }  0  M)  , 
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is  standard  multivariate  normal  with 


corr(yj,yj) 


,222 

I"pli'plj  pii 


20-P^)1' 


2(1'plj)1' 


Ci  t  j) 


Proof.  The  result  follows  using  Lemma  3.6  and  Lemma  2.9  of  the 
previous  chapter. 

Theorem  3.3.  When  one  employs  R„t„  ,  an  asymptotic  (N  -*  °°)  lower 

GV  S 

bound  on  the  PCS^  is  given  by: 


(3.5)  inf  PCS  (£)  >  P(y!  <  n1/2(l/2)t"1/2log  6*  ,  j  t  1) 
ft  a  J 

where  the  (y!  ,  j  /  1}  are  distributed  as  in  (3.4)  with  y. .  =  1/2 
J  ^3 

(i  /  j)  .  When  t  =  1  (resp.  t  =  k  -  1  )  lower  bound  (3.5)  is 

sharp,  and  an  asymptotic  least  favorable  configuration  of  the  relevant 

parameters  is  Z  -  diag(l, 0*, . . . ,0*)  (resp.  Z  =  diag(0*,l, . . . ,1))  . 

Proof.  Hie  proof  of  this  theorem  is  similar  to  the  proof  of  Theorem 

2.3  of  the  previous  chapter. 

Theorem  3.4.  When  one  employs  R„,.  »  an  asymptotic  (N  -*  «>)  lower 

GV  4 

bound  on  the  PCSa  is  given  by: 


(3.6)  inf  PCS  (Z)  >  P(y!  <  n1/2(l/2)t"1/2log  d*  ,  j  i  1) 

Z  a  J 

where  the  (yj  ,  j  /  1}  are  as  in  Theorem  3.3. 

When  t=l  or  t=k-l,  lower  bound  (3.6)  is  sharp,  and 


as  asymptotic  least  favorable  configuration  of  the  relevant  parameters 
is  Z  =  diag(l , ....  1)  . 
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Proof-  proof  of  this  theorem  is  similar  to  the  proof  of  Theorem 

2.3. 

Lemma  3.8.  If  S  denotes  the  size  of  the  selected  subset  of  sub¬ 
classes  when  R„...  is  used, 

(jV4 


(a) 


Ea(S|Z) 


k 

I 

i=l 


1/2 


P(Y. 


log(d*det  T./det  E.) 

l/2„  _ 2  ,1/2 


2t 


CI-pT  . ) 
ij 


t  j) 


where  the  (Y^  ,  i  ^  j}  are  as  in  (3.4)  with  i  in  place  of  1  . 

1  ...  1  ' 

(b)  sup  Ea(S|E)  =  k  which  occurs  when  E  = 

l  1  •  •  •  1  J 

Pro°T •  The  result  is  a  consequence  of  Lemma  3.7. 


CHAPTER  4 


SELECTION  OF  SUBCLASSES  OF  VARIATES  OR  OF  POPULATIONS 
BASED  ON  MEASURES  OF  ASSOCIATION  BETWEEN  TWO 
SUBCLASSES  OF  VARIATES  (ASYMPTOTIC  THEORY) 

4.0.  Introduction 

In  the  present  chapter  we  consider  two  problems  which  have  been 
studied  recently  by  several  investigators.  We  provide  solutions  to 
these  problems  using  asymptotic  theory. 

Section  4.1  contains  certain  preliminaries  and  definitions 
employed  in  the  later  sections.  In  particular,  we  define  a  measure 
of  association  known  as  the  vector  coefficient  of  alienation  between 
two  classes  of  components.  Then,  in  Section  4.2,  we  consider  the 
problem  of  selecting  a  multivariate  normal  population  (among  independent 
populations)  with  the  smallest  vector  coefficient  of  alienation  be¬ 
tween  two  classes  of  components.  Gupta  and  Panchapakesan  (1969) 
and  Rizvi  and  Solomon  (1973)  give  different  formulations  for  this 
problem. 

In  Section  4.3,  we  consider  the  important  problem  of  selecting 
the  best  subclass  of  predictors  for  a  fixed  subclass  of  variates, 
each  of  the  contending  subclasses  being  correlated  with  the  subclass 
previously  specified.  A  quite  general  asymptotic  solution  is  displayed. 
The  vector  coefficient  of  alienation  is  used  as  a  measure  of  asso¬ 
ciation.  Ramberg  (1969)  and  Arvensen  (1971)  obtained  partial  results 
for  related  problems. 

Although  the  problems  are  formulated  in  a  multivariate  normal 
framework,  the  same  asymptotic  results  are  valid  for  a  very  general 
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class  of  multivariate  distribution  functions. 


•1.1.  Pro  1  iminarics 

In  this  section  we  describe  a  few  properties  of  certain  measures 
of  association  between  two  sets  of  variates.  For  further  details 
the  reader  is  referred  to  Hotelling  (1956)  and  Hooper  (1959,  1962). 

Let  (Y,X)  be  a  (q  +  p) -dimensional  random  variable  with 
covariance  matrix 

E  Z 

y  yx 

1  ~  z  i 
xy  xj 

■y  2 

We  assume  that  q  £  p  and  let  be  the  canonical 

correlations  (cf.  Anderson  (1958))  associated  with  Y  and  X  . 

The  conditional  generalized  variance  of  Y  given  X  is 


dot  Z 


det 

f 

Z  E 

>'  yx 

E  E 

xy  x 

y«x 

det  E 

X 

=  det(E  - E  E_1E  ) 
y  yx  x  xy 


It  can  be  shown  that,  if  X  ,  Y  and  Z  are  three  vectors 
of  variates, 


det  Z  <  det  Z  , 
y  *  x  -  y 

det  Z  =  det  (E  -  E  E"1  E  ) 
y*xz  y*x  yz’X  z*x  zy*x 

<  det  E 
—  y*x 


No  single  measure  of  association  is  sufficient  to  fully 
describe  the  relation  between  two  sets  of  variates.  A  complete 
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description  would  he  based  on  the  set  of  canonical  correlations. 
However,  as  we  need  in  the  present  development,  a  single  number  to 
describe  such  a  relation,  we  shall  restrict  consideration  to  real 
functions  of  the  canonical  correlations.  The  following  are  a  few 
of  the  measures  of  association  which  have  been  proposed  in  the 
l iterature: 

The  vector  coefficient  of  alienation  between  Y  and  X  is 

Y  ,  where 
yx 


det  E 

det  £  det  E 

y  x 


It  can  be  shown  that, 


(i) 

X 

11 

/ — \ 

-  vj)...(l  - 

,  o 

2 

<  Y  <  1  . 

—  yx  — 

(ii) 

o 

II 

X 

™  X 

X 

iff  V2  =  I 

for 

some 

i  . 

V2  .  1 
yx 

iff  v*  -  0 

for 

all 

1  ,  i .  e . ,  E  =0 

’  yx 

It  can  be  shown  that 
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(i) 

2 

2 

2 

0 

2 

R 

=  V1 

•  ■ •  » 

<  R  <  1  . 

yx 

q 

-  yx  - 

(ii) 

R2 

=  0 

iff  V2  = 

0 

for  some  £  . 

yx 

£ 

R2 

=  1 

iff  v2  = 

1 

for  all  £  ,  i 

yx 

£ 

(iii) 

R2 

2 

+  Y“ 

q  2 
=  II  vn 

+ 

q  2 

n  (i  -  vp  < 

£=  1  % 

yx 

yx 

£=  1  1 

BX  a.e. 


and,  in  general,  inequality  holds,  except  when  q  =  1  . 

The  trace  correlation  coefficient  between  Y  and  X  is 

P  ,  where 
yx 


p2  =  (1/q)  tr  E  E_1E  E_1  . 
yx  n  yx  x  xy  y 


It  can  be  shown  that 


U) 

p2yx  =  (i/q)  (^ 

»  •  • 

• 

0  <  p2  <  1  . 

-  yx  - 

(ii) 

P2  =  0  iff 
yx 

“2t 

=  0 

for  all 

£  ,  i .e . ,  E  =  0  . 

yx 

P2  =  1  iff 

yx 

2 

V£ 

=  1 

for  all 

£,i.e.,Y=BX  a.e 

In  the  problems  treated  in  the  present  chapter  it  is  mathe¬ 
matically  more  convenient  to  study  selection  procedures  in  terms  of 
2 

Y"  .  When  q  =  1  ,  which  is  probably  the  most  common  case  in  prac- 
yx 

2 

tice,  selecting  in  terms  of  y  is  equivalent  to  selecting  in  terms 

yx 

of  R2  . 
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4.2.  Selecting  the  best  out  of  k  populations  with  respect  to  the 


jopulation  vector  coefficients  of  alienation 


Consider  k  (q  +  p^) -variate  independent  normal  populations, 


\  ,  with  unknown  population  means  and  unknown  population  covariance 


matrices 


Z  Z 
y.  y.x. 

1  ii 


z  z 

x.y.  x. 

ii  i 


(1  <  i  <  k)  . 


We  assume  q  £  min  p.  .  Let  the  population  squared  vector  coefficient 
3  J 

of  alienation  between  Y3  and  X1  be 


_  det  Z . 

v2  =  _ 1 — 

i  det  Z  det  Z 


(1  1  i  £  k)  , 


2  2  2 

and  let  the  ranked  values  of  the  y.  be  y,-..  <  . . .  <  yri  ,  •  It 

i  [1]  —  -  [k] 

is  assumed  that  the  experimenter  has  no  prior  knowledge  concerning 

2  2 

the  values  of  the  y^  ,  or  of  the  pairing  of  the  y^  with  the 


populations  .  (1  £  i,j  £  k)  . 

U  J 

2 

When  q  =  1  ,  selecting  in  terms  of  the  y^  is  equivalent  to 

selecting  in  terms  of  the  population  squared  multiple  correlation 
coefficients,  as  indicated  in  Section  4.1.  These  selection  problems 
(q  =  1)  have  been  considered  by  Gupta  and  Panchapakesan  (1969)  using 
the  subset  approach,  and  by  Rizvi  and  Solomon  (1973)  using  the 
indiffereace-zone  approach.  Both  papers  provide  different  treatments 


65 


than  ours;  in  particular,  our  indifference-zone  is  distinct  from 
Rizvi  and  Solomon's,  being  perhaps  more  natural. 

Indifference- zone  formulation 

The  experimenter's  goal  is  to  select  the  population  associated 

with  •  He  specifies  {0*,P*}  ,  0*  >  1  ,  1/k  <  P*  <  1  , 

before  experimentation  starts.  If  PCS  ({£.}}  denotes  the  probability 

K  1 

of  a  correct  selection  when  decision  rule  R  is  used,  we  restrict 
consideration  to  decision  procedures  R  which  guarantee  the  proba¬ 
bility  requirement: 

inf  PCS  ({£. })  >  P* 

n  k  i 

where 

n  =  UEj,** •,^k)  I e*Y f i ]  -yj  •  j  *  n]}  . 

Single-stage  "natural"  selection  procedures  will  be  used. 

We  propose  the  following  decision  procedure: 

A  sample  of  N  independent  vector  observations. 
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where  n  =  N  -  1  ,  and  the  sample  squared  vector  coefficient  of 
alienation, 


G2 


det  S. 


i  '  det  S  det  S 


Rule  :  Select  the  population  associated  with 

_ l  1 

2  2  2  2 
j  =  min  {G^,...,Gj_}  ,  as  the  one  corresponding  to  y^  . 

Our  task  is  to  u  *1. rt„-  the  smallest  sample  size  N  which 
guarantees  the  probability  requirement  when  R  is  used. 


Subset  formulation 

If  the  experimenter's  goal  is  to  select  a  subset  of  popu- 

2 

lations  containing  the  one  associated  with  y^  ,  he  specifies  {P*}  , 

1/k  <  P*  <  1  ,  prior  to  the  start  of  experimentation.  Then  if 
PCSg({?^})  has  the  same  meaning  as  above,  we  restrict  consideration 

to  decision  procedures  R  which  guarantee  the  probability  requirement: 


inf  _  PCSR({I.  »  >_  P* 


We  propose  the  following  decision  procedure: 

2 

Rule  R^:  Include  the  population  associated  with  in 

2  2 

the  selected  subset  of  populations  if  G.  <  d*G,,n  ,  where  d*  >  1 

v  j  -  [1] 

is  a  specified  constant. 

Our  objective  then  is  to  determine  the  smallest  sample  size 
N  which  will  guarantee  the  probability  requirement  when  R^  is 
employed. 
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It  is  clear  that  we  may  disregard  the  population  means  in 
what  tollows.  It  will  be  seen  (Theorems  4.1  and  4.2)  that  we  may 

assume,  without  loss  of  generality,  that  Yj  ±  y~  (j  ji  1)  . 

Part  of  the  following  lemma  is  proved  in  Siotani,  Chou  and 
Cong  (1971). 

Lemma  4.1.  The  a .d.  of 


n 


1/2 


i  s 


multivariate  normal  with  zero  means,  variances 


4 

4y7*. 

i  l 


and 


zero  correlations,  where. 


0  <  4. 

-  l 


tr  E-1E 

y.  y.x.  x.  x.y. 

ill  i  ii 


1  9  • 


|lroof •  Since  the  squared  trace  correlation  between  Y.  and  X 

i  i 

2 

1S  py  x  ~  ~tT  ’  lt:  f°llows  that  0  <_  i.  <_  q  .  Using  Lemma  3.1  of 

/  i  i  1  i 

Chapter  3,  with  the  notation  introduced  there,  we  have  only  to  compute 
the  asymptotic  variances.  The  result  follows  noticing  that 


det  £ . 

f.  (E  )  =  - i _ 

i  i;  det  £  det  £ 
y.  x. 

i  i 


^i  C^i )  =  O^Uog  det  £.  -  log  det  £  -  log  det  £  )} 

y x . 


£' 1 
yi 

0 

0 

0 

0 

0 

J 

0 

. 

£_1 

X. 

l 

tr  (4> .  (£.)£.) 2  =  2  tr  £-1£  £-1Z  =  21 

yi  yiXi  xi  Vi  1 


QED 


6S 


Lemma  4.2.  The  a.d.  of 


o  7 


(4.1)  Y 


1  1/2  ,  l0KCGj/G  )-1o8(y  /Y  ) 

j =  n  1  — rr;r;v2  J  } 


2(£1+dj) 


(J  /  1) 


is  standard  multivariate  normal  with 


corr(v|,y‘)  -  w  =>  - - j—  (i  +  j, 

‘  J  (fcj  +  fc.)  ^VV  /“ 


Proof.  This  result  follows  using  Lemma  4.1  and  Lemma  2.9  of  Chapter  2. 
Theorem  4.1.  If  the  experimenter  uses  Rule  R  ,  an  asymptotic 

(N  00 )  lower  bound  on  the  PCS  is 

a 


(4.2) 


inf  PCS  >  P(y‘  .  j  t  1) 

“  J  2(2q)  ' 


where  (Y  ,  j  /  1)  is  distributed  as  in  (4.1).  with  w  =  1/2 
J  U 

U  t  j)  • 

Proof .  We  shall  only  outline  the  proof,  since  it  is  very  similar  to 

the  proof  of  Theorem  2.3. 

2  2 

In  n  ,  i  f  Yj  •'  Yj  (j  /  1)  we  have 


1/2.  ,  2.  2. 

22  1  n  log(Y-/Yl) 

PCSa  =  P(G^  <  Gt  ,  j  /  1)  =  P(Y  <  - —l,]1  ,  j  /  1) 

1  p (y  !  <  ^  ,  j  /  1) 

J  2(^+1 l.)  U 


^*(n) 


cn*(r) 


«  r  9*  *- 

(wij}  ’  cyy 


172  ^ 
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1/7 

where  c  *(n)  =  n  (1/2) log  6*  ,  and 

0 


V.  .)  (c2(n) » •  *  •  *ck(n)) 


c.7^n)  c;k(n) 


I  ■■■_[  f(wij)(y2 . yk)J>V 


,dyk  • 


c j  (»)  =  + 


•1/2 


(j  i  i)  , 


ind  f(w. .) (y2’ “  -  ,yk)  is  the  P-d-f.  of  the  (y!  ,  j  +  1)  . 
It  is  easy  to  check  that 


3$ 


(“ij5 

K  0  (j  f  1)  . 


dl. 

J 


Let  w*  =  Itj/U^q)  ,  c  (n)  =  cQ*  (n)  /  f  i  +q)  .  Then  it 

follows  from  the  signs  of  the  last  derivatives  that 


Vij)(c2(n,,--'’ck(n)1  >  *(w*)(c(n)'*-‘»c(n))  :  V*) 

Now , 


i  C-  3  <J) 

(w*)  _  (w*j 


3?. 


3w* 


3$ 

3w*  +  (w*) 


H1  fixed  H1  9*1 


w*  fixed 


3w*  (k-  ' )  (k-2)  c/n^ 

=  Ji  2 — ~  j  •••  i  V*)(cCn),c(n),y4’’'',yk)d>V--dyk 


c(n)  c (n) 

34,  (k_1)  /  /  frw*i(c(n),y7,...,yjdy7...dyu  . 

1  -00  _ oo  v  ’ 


3c  (n) 


.3’ 


V“'3* 


cn*fn) 


When 


-►  oo 


,  we  have 
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(w*)  < 


Therefore,  the  infimum  of  PCS^  occurs  when  £ .  =  q  (1  <_  i  <_  k)  ,  QOD 

It  is  not  easy  to  display  an  asymptotic  least  favorable 
configuration  of  the  parameters.  This  is  so  because  when  JL  =  q  , 

Y.  =  BX.  a.e.,  and  E  =  BE  Bt  ,  I  =  BE  ,  implying  that 

yi  Xi  yixi  xi 

2 

y7  =  0  .  However,  it  is  possible  to  use  a  limit  argument  to  show 

that  the  least  favorable  configuration  of  these  parameters  occurs 

2  2  2 
when  -*  0  and  y./y^  -*■  0*  . 


Theorem  4.2.  If  Rule^  is  used,  an  asymptotic  (N  -+  °°)  lower 


bound 


on  the  PCS  is, 
a 


(4.4) 


1/2 

,1  „  n  '  log  d* 


inf  PCS  1  P(Y  1  - - _  j/n 

a  J  2(2q)1/Z 


where  (Y^  ,  j  t  1}  is  as  in  the  previous  theorem. 

Proof.  The  result  follows  as  in  the  proof  of  Theorem  4.1. 

To  obtain  an  asymptotic  least  favorable  configuration  of  the 

2  2  2 

parameters,  we  must  take  Yi  =  Yj  (i  t  j)  ,  and  let  yi  -+  0  for  all 
i  ,  so  that  JL  -*  q  ,  and  then  (4.4)  follows. 

Lemma  4.3.  If  S  denotes  the  size  of  the  selected  subset  when  R  _ 


if  employed,  we  have 


t)  E  (S|{Z  })  =  l  P(Y"  < 
a  1  i=l  J 


n1//^log(d*y::/Y?) 


2  CPj+Pj ) 


1 7T^-  •  1  *  j) 


71 


where  the  (Y*  ,  i  jt  j }  are  distributed  as  in  (4.1)  with  1  replaced 
bv  i  . 


(b) 


sup  H  (S | {X  >)  =  k 


which  occurs  when 


Y. 

i 


B. X.  a.e . ,  and 

l  l 


E . 
l 


B .  E  Bt 
1  x.  1 

l 


l 


B.  E  ) 
l  x; 


i 

in  which  case  G7  =  0  a.e.  (1  1  i  1  k)  . 

Proof.  The  result  is  a  consequence  of  previous  developments. 


4.3.  Selecting  the  best  subclass  of  predictors  (single  population) 

fvl 


Consider  a  (q+pj -variate  normal  population. 


XJ 


with  unknown 


population  mean  vector  and  unknown  population  covariance  matrix 


E  = 


E  E 

y  y* 

E  E 

xy  x 


Let  X-1  (1  ±  j  £  k)  be  k  subclasses  of  X 

of  which  is  entirely  contained  in  the  other, 
lation  covariance  matrix  of  (1  <_  j  <_  k) 


of  size  p.  ,  no  one 
Let  Ej  be  the  popu- 
The  following  is  a 


possible  covariance  matrix  in  the  present  setting: 
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Denote  the  covariance  matrix  of 


let  the  population  conditional  generalized  variance  of  Y 


f  y 

Y 

£  £  . 

i 

y  yj 

/  ■  1 

X 

V__l. 

by  = 

E.  1 

l  jy  j  J 

,  and 


given 


XJ  be 


X  .  =  det  E  . 
J  yj 


det 

det  £ . 
J 


O  I  j  <  k)  . 


Let  the  ordered  values  of  the  A.  be  1  ...  <  A  We  assume 

that  the  experimenter  has  no  prior  knowledge  concerning  the  values 
of  the  A.  ,  or  of  the  pairing  of  the  A  ^  with 

Yi)  O  1  i. j  1  k)  . 


In  the  present  context,  selecting  in  terms  of  the  conditional 
generalized  variances,  A.  ,  is  equivalent  to  selecting  in  terms  of 

the  squared  vector  coefficient  of  alienation  between  Y  and  X* 


since  Y  is  a  common  factor  to  each  pair  '  (1  <  i  <  k) 

.  XaJ  ~ 


When 


q  =  1  we  are  equivalently  selecting  in  terms  of  the  multiple 
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correlation  coefficients  between  Y  and  X1  (1  <_  i  £  k)  .  Ramberg 
(1969)  considered  the  problem  (q  =  1)  of  selecting  the  subclass 

X*1  associated  with  A^  ,  for  some  special  cases  of  T,  ,  and  deve¬ 


loped  lower  bounds  on  PCS  ,  using  an  indifference-zone  approach. 

Our  bound  (Theorem  4.3)  is  sharper  than  any  of  his,  and  we  show  that 
it  is  attained  in  some  important  cases. 

A  particular  case  cf  the  theory  we  develop  is  the  problem 
of  selecting  the  "best"  (corresponding  to  A  ^  ^  subclass  of  X  of 


size  t 


for  which  there  are 


possible  decisions.  Arvensen 


(1971)  devised  a  Bayesian  procedure  for  a  subset  approach  formulation 
of  this  problem,  when  q  =  1  .  He  used  asymptotic  distribution  results 
of  Siotani  (1971),  but  his  results  are  very  cumbersome.  Theorems 
4.5  and  4.6  give  a  simple  counterpart  to  his  theory. 


Indifference-zone  formulation 

The  experimenter's  goal  is  to  select  the  subclass  X-1 
(1  <_  j  <_  k)  associated  with  A^  .  He  specifies  { 6 * , P* }  ,  0*  >  1  , 

1/k  <  P*  <  1  ,  prior  to  experimentation.  Then,  if  PCS  (Z)  denotes 

K 

the  probability  of  a  correct  selection  when  decision  procedure  R  is 
employed,  we  restrict  consideration  to  procedures  R  which  guarantee 
the  probability  requirement: 

inf  PCS^Z)  >_  P*  , 


where , 
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fl J  £  Xj  ,  j  t  [1]}  . 

We  propose  the  use  of  the  following  single-stage  "natural”  selection 
procedure  for  this  indifference- zone  goal: 

A  sample  of  N  independent  vector  observations. 


a 


(1  <  a  <  N) 


is  taken.  Let, 


2 


j 


a 


(1  £  a  £  N) 


(1  <  j  <  k) 


correspond  to 


Y 

Xj 


For  each  (1  £  j  <  k)  compute, 


l  (ZJ  “  -  zV/n  s 

Qt=l 


where  n  N  1  ,  and  the  sampie  conditional  generalized  variances. 


V. 

3 


det  S 

V3 


det  S-1 

det  S . 
J 


Rule  •  Select  the  subclass  X^  (1  £  3  £  k)  associated 
with  Vjjj  =  miu  {Vj,...,V^}  ,  as  the  subclass  corresponding  to 

xm  • 

Our  objective  is  to  determine  the  smallest  sample  size  N 
which  will  guarantee  the  probability  requirement  when  R  is  used. 

v  J 
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Subset  formulation 

If  the  experimenter's  goal  is  to  select  a  subset  of  subclasses 
of  X  ,  X.  (1  £  j  £  k)  ,  which  contains  the  subclass  associated 

with  Aflj  ,  he  specifies  {P*}  ,  1/k  <  P*  <  l  ,  prior  to  experi¬ 
mentation.  Then,  if  PCS^(Z)  is  as  defined  above,  we  limit  considera¬ 
tion  to  decision  procedures  R  which  guarantee  the  probability 
requirement : 


inf  PCS„(£)  >  P*  . 

Z  K  ~ 

We  propose  the  following  "natural"  procedure  for  this  subset 

goal : 

Rule  f?c4:  Include  the  subclass  (1  £  j  £  k)  in  the 

selected  subset  of  subclasses  if  V..  <  d*V^  ,  where  d*  >  1  is  a 
specified  constant. 

Our  objective  is  to  determine  the  smallest  N  which  will 

guarantee  the  probability  requirement,  when  R  is  used 

C4 

It  is  clear  that  the  population  means  may  be  ignored  in  the 
following  developments.  It  will  follow  from  Theorems  4.3  and  4.5 
below  that  we  may  assume,  without  loss  of  generality,  that 

*1  £  (j  *  D  • 

Lemma  4.4.  The  a.d.  of 


nl/2(Vj  - 


is  multivariate  normal,  with  zero  means,  variances 


,  and 
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covariances  2X.X.Z..  ,  where  fc.  >  0  . 

i  J  ij  lj  - 

Proof.  We  employ  Lemma  3.1  of  Chapter  3,  with  its  special  notation. 
In  order  to  compute  the  variances,  we  note  that, 


Hence, 


fj  (£)  =  det  E-Vdet  E  , 

<j>_.  (E)  =  (9ag(lo8  det  £•*  -  log  det  £.)} 


-1 


0 

0 


tr(f.(E)E)2  =  tr(l 


q+p, 


e"  ^E  I 

l  j  iy  Pj  J 


J  =  q  , 


:uid  the  variances  equal  2q\i  (1  <  j  <  k) 


The  covariances  are  compuced  similarly,  since  we  define, 


2X  A  tr  <p.  (E)  Zip.  (E)E  e  2A.X.£..  . 

1  J  1  J  i  i  l] 

We  have  only  to  show  that  £„  >  0  .  Since  <f>  (E)  can  be  easily  shown 

to  be  symmetric  nonnegative  definite,  it  follows  that 

*ij  “  tr  >  0  .  QED 

Lemma  4.5.  The  a.d.  of 


(4.5) 


V1  -  n1'2 
J 


{ 


logcvyVjMogcyA.) 

2q1/2(l-£1;j)1/2 


}  (J  /  1) 


m 
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is  standard  multivariate  normal  with 


cotCy'.y')  :  y  -  -  -  m  (i  *  J) 

J  1J  20-*a)  '  (1-lij)  ' 


Proof.  One  uses  Lemma  4.4  and  Lemma  2.9  of  Chapter  2. 

Theorem  4.3.  If  the  experimenter  uses  Rule  Rc^  ,  an  asymptotic 

(N  •+  »)  lower  bound  on  the  PCS  is 

a 


(4.6) 


1/2 

inf  PCS  >  P(Y?<  —  \9,\  9*  ,  j  M)  , 
a  3  2ql/2 


where  the  (Y^  ,  j  /  l)  are  as  in  (4.5)  with  =  1/2  (i  /  j)  . 

Proof.  This  theorem  is  an  immediate  consequence  of  Lemma  4.5  and 
Theorem  2.3  of  Chapter  2. 

Lower  bound  (4.6)  turns  out  to  be  a  sharp  bound  for  a  very 
wide  class  of  problems.  Indeed,  the  only  requirement  is  that  each 

subclass  XJ  have  a  variate  of  its  own.  More  precisely,  for 

(1  1  j  i  k)  ,  there  exists  x^  such  that  x.  £  X"1  ,  but  x^  4.  X1 

(i  /  j)  .  When  this  is  the  case,  we  will  display  an  asymptotic 
(N  ->  ®)  least  favorable  configuration  of  1  .  In  order  to  do  so, 
let  y  be  any  fixed  component  of  Y  and  define. 


a  .  -  cov(y,x.)  ,  a.  .  =  cov(x,,x.)  (1  <  i,i  <  k)  . 
yj  y  13  v  i  y  v  J  - 


Theorem  4,4.  An  asymptotic  (N  -*  °°)  least  favorable  configuration 
of  E  ,  when  each  X-1  has  at  least  one  variate  of  its  own,  is: 
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(i)  o  =  .) .  .  -  1  ( 1  <  j  <  k) 

yy  )j  -- 

<>»  v  =  ('-4)1/: 

(iin  °yj  =  t1  -  $ 1/2  U  >  n 

,  _ _ l-c/k-c/0*k _ 

IV  li~  2  ?  1/2 

l}  (l-e/k-e/0*k+t  ^*k  )17^ 


(vi)  a^l  other  diagonal  elements  of  E^  equal  to  6  y 

(vii)  all  other  diagonal  elements  of  E^  equal  to  1  , 

(viii)  all  other  elements  of  E  equal  to  zero. 


Finally,  we  take  e  sufficiently  small  and  let  6  -+  0  . 

Proof.  In  order  to  show  that  E  so  defined  is  positive  semi-definite, 
three  conditions  must  be  satisfied: 


Ojj  1  -  l/(k-2)  (i, j  >  1) 

2 

a.  .-a  . 

K  J  >  -  l/(k-2)  (i,j  >  1) 

'-IJ 


o..-a2.-(o  -a  a  ,)2/(l-o2  ) 

■■■» -/J  - r^i-i/tk-2)  u.j  >  i) 

1-a  .-To  .-a  o  .)/(l-a  ) 

yj  lj  yi  yj  yi 


A  tedious,  but  straightforward,  computation  shows  that  these  conditions 
are  satisfied  when  e  is  sufficiently  small.  Next,  we  observe  that. 


9*X  =  3*«-(<l-1)ci-o2 J  =  X.  -  fi'^'^Cl-o2.)  Cj  >  1) 

1  v  yr  j  yy  J 
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Finally,  another  tedious  calculation  shows  that, 


t.  . 

ij 


(l-02.)'1(l-o2.)'1{(q-l)6+(l-o2.-a2.+a  .a  .a..)2} 
y*  y 3  M  yi  yj  yi  yj  ir 


2  2 

Since  it  may  be  checked  that  (l-o  .-a  ,+a  .a  .a..)  =  0  ,  as  4  -*■  0  , 

yi  yj  yi  yj  ijJ 

we  have  4^  -*  0  ,  for  all  i,j  .  QEI) 

When  q  =  1  ,  the  limit  argument  6  ->  0  is  unnecessary. 

Theorem  4.5.  If  Rule  R^  is  used,  an  asymptotic  (N  -*>  00 )  lower  bound 

on  the  PCS  is 
a 

1/2 

(4.7)  inf  PCS  1P(Y?  1  n— d*  ,  j  /  D  , 

a  J  2q1/Z 


where  the  (yj  ,  j  /  1}  are  as  in  Theorem  4.3. 

Proof.  The  proof  is  similar  to  the  proof  of  Theorem  4.3. 

Theorem  4.6.  Using  the  same  notation  as  in  Theorem  4.4,  an  asymptotic 
(N  -►  «)  least  favorable  configuration  of  E  ,  when  each  has 

at  least  one  variate  of  its  own,  is: 


(i) 

a 

yy 

=  a . .  =  1  (1  <  j  <  k) 

JJ  “  ~ 

(ii) 

o  . 

yj 

=  (1  -  e/k) 1/2  (1  <  j  <  k) 

(iii) 

o.  . 
1J 

l-2e/k  .  .  ,N 

*  l-e/k  (1’J  *  11 

(iv) 

all 

other  diagonal  elements  of 

E 

y 

equal  to  5  ^ 

(v) 

all 

other  diagonal  elements  of 

equal  to  1  , 

(vi) 

all 

other  elements  of  E  equal 

to 

zero. 

Finally,  we  take  e  small  and  let  5  -►  0  . 
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Proof.  This  proof  is  similar  to  the  proof  of  Theorem  4.4.  The 
conditions  that  E  be  positive  semi-definite  are: 

oi;j  1  -  1/Ck-l)  (i,j  >  1) 

a.  .-a“ . 

.U..J2  > .  1/(k.1}  (i>j  >  n  f 
l-o  . 

yj 

which  can  be  shown  to  be  satisfied  when  e  is  sufficiently  small. 
Moreover, 


X.  =  M-«2 


(l-o  .)  =  X.  (i  f  j)  . 
v  yj  l 


Finally,  for  (i  /  j)  , 


«■.  .  =  (l-o2.)'1(l-o2.)"1{(q-n6*(l-o2.-o2.+o  .o  ,o.  .>2}  -+■  0 
lj  yiJ  1  yj 1  yi  yj  yi  yj  ij-  ; 


2  2 

because  (l-o  . -o  +o  .o  ,o..)  =  0  and  6  -+•  0 
yi  yj  yi  yj  ijJ 


QED 


When  q  =  1  ,  the  limit  argument  6  -*•  0  is  unnecessary. 

Lemma  4,6.  If  3  denotes  the  size  of  the  selected  subset  of  subclasses 
when  is  used,  we  have 


k  i  n1/2log(d*X  /J  ) 

(a)  E=(S|E)  =  £  PPf  <  m  ■  i  i  j) 

‘ij1 


i=l  1  2q1/2(l-t. ,)1/2 


where  the  (Yj  ,  i  /  j)  aie  distributed  as  in  (4.5)  with  1  replaced 


by  i  . 
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(b)  sup  Ea(S|l)  -  k  when  Y  -  BX  a.e.,  in  which  case, 
syj  =  0  a-e-  n  I  j  Ik)  . 

Proof.  Consequence  of  previous  developments. 
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